-
Notifications
You must be signed in to change notification settings - Fork 452
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unions used for type-punning should be repr(C)
#588
Comments
So after thinking about this a bit, I believe this The problem is the following. The call ABI of this That is, every time you use a vector operation, which from looking at the code, happens more often than array operations, we have to copy the union from the stack to a vector register, do the operation, and copy it back to the stack. This appears to be happening all the time. Sadly, Rust can't do better for you here, e.g., it cannot make this union have a Vector ABI. The problem is, that whether you want an Aggregate ABI or a Vector ABI depends on how you will be using this union most of the time. Last time I heard, performance of this code is great. But that's only because LLVM is able to undo all the consecutive stack->vector->stack->vector... spills, and keep everything in a vector. Otherwise I'd guess that this would perform slower than having no vectorization at all. So my recommendation here would be to keep the union if most of the time you are operating on it like an array. The code appears to be operating on it almost always like a vector, though. So if that's the case, I think it would be better to commit to that, and use a type that has the Vector ABI, creating an array locally only when you need it, and being aware that this will spill the contents of the vector to the stack. For that you can use a type alias or a newtype for the For copying the vector into an array and back, you probably want to just use |
Before this commit, u8x16 and u8x32 were repr(Rust) unions. This introduced unspecified behavior because the field offsets of repr(Rust) unions are not guaranteed to be at offset 0, so that field access was potentially UB. This commit fixes that, and closes rust-lang#588 . The unions were also generating a lot of unnecessary memory operations. This commit fixes that as well. The issue is that unions have an Aggregate call ABI, which is the same as the call ABI of arrays. That is, they are passed around by memory, and not in Vector registers. This is good, if most of the time one operates on them as arrays. This was, however, not the case. Most of the operations on these unions are using SIMD instructions. This means that the union needs to be copied into a SIMD register, operated on, and then spilled back to the stack, on every single operation. That's unnecessary, although apparently LLVM was able to optimize all the unnecessary memory operations away and leave these always in registers. This commit fixes this issue as well, by making the u8x16 and u8x32 repr(transparent) newtypes over the architecture specific vector types, giving them the Vector ABI. The vectors are then copied to the stack only when necessary, and as little as possible. This is done using mem::transmute, removing the need for unions altogether (fixing rust-lang#588 by not having to worry about union layout at all). To make it clear when the vectors are spilled into the stack, the vector::replace(index, value) API has been removed, and instead, only a vector::bytes(self) and a vector::from_bytes(&mut self, [u8; N]) APIs are provided instead. This prevents spilling the vectors back and forth onto the stack every time an index needs to be modified, by using vector::bytes to spill the vector to the stack once, making all the random-access modifications in memory, and then using vector::from_bytes only once to move the memory back into a SIMD register.
Before this commit, u8x16 and u8x32 were repr(Rust) unions. This introduced unspecified behavior because the field offsets of repr(Rust) unions are not guaranteed to be at offset 0, so that field access was potentially UB. This commit fixes that, and closes rust-lang#588 . The unions were also generating a lot of unnecessary memory operations. This commit fixes that as well. The issue is that unions have an Aggregate call ABI, which is the same as the call ABI of arrays. That is, they are passed around by memory, and not in Vector registers. This is good, if most of the time one operates on them as arrays. This was, however, not the case. Most of the operations on these unions are using SIMD instructions. This means that the union needs to be copied into a SIMD register, operated on, and then spilled back to the stack, on every single operation. That's unnecessary, although apparently LLVM was able to optimize all the unnecessary memory operations away and leave these always in registers. This commit fixes this issue as well, by making the u8x16 and u8x32 repr(transparent) newtypes over the architecture specific vector types, giving them the Vector ABI. The vectors are then copied to the stack only when necessary, and as little as possible. This is done using mem::transmute, removing the need for unions altogether (fixing rust-lang#588 by not having to worry about union layout at all). To make it clear when the vectors are spilled into the stack, the vector::replace(index, value) API has been removed, and instead, only a vector::bytes(self) and a vector::from_bytes(&mut self, [u8; N]) APIs are provided instead. This prevents spilling the vectors back and forth onto the stack every time an index needs to be modified, by using vector::bytes to spill the vector to the stack once, making all the random-access modifications in memory, and then using vector::from_bytes only once to move the memory back into a SIMD register.
Before this commit, u8x16 and u8x32 were repr(Rust) unions. This introduced unspecified behavior because the field offsets of repr(Rust) unions are not guaranteed to be at offset 0, so that field access was potentially UB. This commit fixes that, and closes rust-lang#588 . The unions were also generating a lot of unnecessary memory operations. This commit fixes that as well. The issue is that unions have an Aggregate call ABI, which is the same as the call ABI of arrays. That is, they are passed around by memory, and not in Vector registers. This is good, if most of the time one operates on them as arrays. This was, however, not the case. Most of the operations on these unions are using SIMD instructions. This means that the union needs to be copied into a SIMD register, operated on, and then spilled back to the stack, on every single operation. That's unnecessary, although apparently LLVM was able to optimize all the unnecessary memory operations away and leave these always in registers. This commit fixes this issue as well, by making the u8x16 and u8x32 repr(transparent) newtypes over the architecture specific vector types, giving them the Vector ABI. The vectors are then copied to the stack only when necessary, and as little as possible. This is done using mem::transmute, removing the need for unions altogether (fixing rust-lang#588 by not having to worry about union layout at all). To make it clear when the vectors are spilled into the stack, the vector::replace(index, value) and vector::extract(index) APIs have been removed, and instead, only a vector::bytes(self) and a vector::from_bytes(&mut self, [u8; N]) APIs are provided instead. This prevents spilling the vectors back and forth onto the stack every time an index needs to be modified, by using vector::bytes to spill the vector to the stack once, making all the random-access modifications in memory, and then using vector::from_bytes only once to move the memory back into a SIMD register. fixup
Before this commit, u8x16 and u8x32 were repr(Rust) unions. This introduced unspecified behavior because the field offsets of repr(Rust) unions are not guaranteed to be at offset 0, so that field access was potentially UB. This commit fixes that, and closes rust-lang#588 . The unions were also generating a lot of unnecessary memory operations. This commit fixes that as well. The issue is that unions have an Aggregate call ABI, which is the same as the call ABI of arrays. That is, they are passed around by memory, and not in Vector registers. This is good, if most of the time one operates on them as arrays. This was, however, not the case. Most of the operations on these unions are using SIMD instructions. This means that the union needs to be copied into a SIMD register, operated on, and then spilled back to the stack, on every single operation. That's unnecessary, although apparently LLVM was able to optimize all the unnecessary memory operations away and leave these always in registers. This commit fixes this issue as well, by making the u8x16 and u8x32 repr(transparent) newtypes over the architecture specific vector types, giving them the Vector ABI. The vectors are then copied to the stack only when necessary, and as little as possible. This is done using mem::transmute, removing the need for unions altogether (fixing rust-lang#588 by not having to worry about union layout at all). To make it clear when the vectors are spilled into the stack, the vector::replace(index, value) and vector::extract(index) APIs have been removed, and instead, only a vector::bytes(self) and a vector::from_bytes(&mut self, [u8; N]) APIs are provided instead. This prevents spilling the vectors back and forth onto the stack every time an index needs to be modified, by using vector::bytes to spill the vector to the stack once, making all the random-access modifications in memory, and then using vector::from_bytes only once to move the memory back into a SIMD register.
@RalfJung Thanks so much for reporting this! It should be fixed in |
At
regex/src/vector/ssse3.rs
Lines 80 to 83 in 172898a
a union is declared for the purpose of type-punning. However, without
repr(C)
, the fields of that union might not be at offset 0, which makes it unsuited for type punning. To get such layout guarantees, you need to specifyrepr(C)
.Also see https://rust-lang.github.io/unsafe-code-guidelines/layout/unions.html.
The text was updated successfully, but these errors were encountered: