Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document endian-dependent behavior, if any #393

Closed
gnzlbg opened this issue Mar 22, 2018 · 4 comments
Closed

Document endian-dependent behavior, if any #393

gnzlbg opened this issue Mar 22, 2018 · 4 comments

Comments

@gnzlbg
Copy link
Contributor

gnzlbg commented Mar 22, 2018

@sunfish raised the point that we should specify how portable packed SIMD vector types behave with respect to the endianness of the architecture.

The order of the elements within a vector depends on the endianness of the machine. This could not only affect the results one gets from the indexed operations (extract,store,replace) and memory load and stores, but also the results of bit casts, for example, when bit-casting i8x16toi16x8thei8s might be ordered differently inside each i16` lanes on big endian machines, affecting the results.

The weird thing is that the portable vector type tests are running on PowerPC64 and PowerPC64el and all 3000 tests currently pass. These tests are pretty generic, so maybe we are unlucky and these tests are not hitting any endianness issues.

Or maybe LLVM handles this for us already somehow? We should add a couple of tests that we expect to break in either big-endian or little-endian and see what happens.

If LLVM handles this for us, we should investigate what exactly LLVM does and document that.


If LLVM does not handle this for us, there are many ways about how to proceed:

  • make endianness machine specific: this would make the vector types not
    portable anymore, because asserts like those above would fail or pass dependin
    on endianness.

  • renumber indices: if extract, store, and replace would take compile-time
    indices, we could easily re-number them in big-endian machines to make the
    asserts in the code above pass.

but IMO we should figure out if LLVM handles this for us first before spending time thinking about any hypothetical solutions. If someone has access to real big-endian hardware, it would be nice if that person could run stdsimd tests and report if whatever tests we come up with here pass or break.


IIUC, the following asserts might pass/fail depending on endianness:

let v = i32x4::new(0, 1, 2, 3);
assert_eq!(v.extract(0), 0); // OK in LE - ERROR in BE
assert_eq!(v.extract(3), 0); // ERROR in LE - OK in BE

let x = i8x16::new(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15);
let t: i16x8 = x.into_bits(); // mem::transmute 
let t_el = i16x8::new(256, 770, 1284, 1798, 2312, 2826, 3340, 3854);
assert_eq!(t, t_el); // OK LE; ERROR: BE

cc @hsivonen who is more familiar with these issues than I am

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Mar 22, 2018

So I've added two tests that should break, and they did indeed fail on powerpc and powerpc64, while they pass on powerpc64el: https://travis-ci.org/gnzlbg/stdsimd/builds/356837387

This is good news, since it means that we can at least use the powerpc and powerpc64 travis build bots to test big-endian hardware.

It is also good news, because the only way to do a bit-cast per the RFC is to use unsafe { mem::transmute(...) } which requires unsafe, and thus things breaking on big-endian is "ok-ish".

@hsivonen
Copy link
Member

I'm not really familiar with this beyond:

  1. Logic saying that lane-width-changing bitcasts should be expected to reveal endianness, which the above-mentioned Travis job shows to be the correct assumption.
  2. clang's arm_neon.h has a bunch of #ifdefs that in the big-endian case reverse the vector before and after calling whatever built-in a given ARM-defined vendor intrinsic is implemented as in LLVM, so in the big-endian ARM case as far as the behavior of ARM-defined intrinsics goes, endianness leaks outside LLVM itself to the headers provided by clang.

It's quite possible that whatever behavior LLVM's ARM-specific built-ins exhibit in the big-endian mode is preferable in terms of portability between little-endian and big-endian ARM compared to how ARM itself had defined them. I don't know.

According to Wikipedia, ARM was little-endian before it had a big-endian option. It's really hard to find information of any actual uses of big-endian ARM. It seems that people doing embedded systems in Rust are far more concerned about support for non-ARM architectures than about enabling big-endian ARM. So testing the ARM vendor intrinsics in practice in the big-endian case to find out if they behave portably may prove hard. But if it's too hard to find a way to test them, maybe it's an indication that they don't need to be tested in that configuration. Also, the behavior of ARM intrinsics is moot to the extent this issue is about portable SIMD.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Mar 22, 2018

Thanks @hsivonen . FWIW there is a PR documenting endian-dependent behavior that has some more tests: #394 and the travis powerpc buildbots allow us to test this just fine, so that problem is solved I guess.

For the RFC, the only way to trigger this behavior is by using unsafe code (mem::transmute, slice::from_raw_parts, and friends), so I'd say that if endianness issues are only revealed through unsafe code, then at least initially that's "ok-ish".

An API for safe bit-casts should in my opinion provide endian-independent behavior by default. We could and should experiment with doing something like what ARM headers do in the stdsimd FromBits/IntoBits traits. These are currently safe, but they fail on big-endian, so this is a bug that must be fixed either way.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Apr 11, 2018

Has been documented in https://github.com/rust-lang-nursery/stdsimd/blob/master/crates/coresimd/tests/endian_tests.rs

Basically, transmuting between arrays of same element type and length as vectors works in an endian-independent way, but the behavior of transmuting to vectors of the same size but different numbers of lanes is endian dependent. Transmuting vectors to/from tuples of the same element type and number of elements happens to work but is undefined behavior, and transmuting to other tuples produces garbage since the elements of the tuple can be reordered and is therefore undefined behavior.

@gnzlbg gnzlbg closed this as completed Apr 11, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants