-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix unaligned load and stores: (#4528) #4531
Conversation
Note: I pushed a separate commit for the xxhash fix. I think the SlabAllocator fix should be uncontroversial, but the xxhash fix should be considered separately. |
@@ -260,7 +260,13 @@ FORCE_INLINE U64 | |||
XXH_readLE64_align(const void* ptr, XXH_endianess endian, XXH_alignment align) | |||
{ | |||
if (align == XXH_unaligned) | |||
return endian == XXH_littleEndian ? A64(ptr) : XXH_swap64(A64(ptr)); | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The XXHASH code has special config for this called XXH_USE_UNALIGNED_ACCESS. You can see it in https://github.com/XRPLF/rippled/blob/develop/src/ripple/beast/hash/impl/xxhash.cpp#L130 and maybe you can set this for compiling without making the changes to the external code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like that would have a larger performance penalty than the change I made (although I admit I haven't looked into it that deeply). If people would rather leave it unmodified we can look at the perf penalty of XXH_USE_UNALIGNED_ACCESS
; I think I'm weakly in favor of modifying xxhash with memcpy, but I'm fine backing out the change if there are objections.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I'm fine with modifying xxhash
with memcpy()
. We've had this copy of the code in our code base for nine years or so. And it looks like our copy is already not quite pristine. Putting the memcpy()
where we know we need it makes the patch easy to understand. Messing with the macro introduces more variables and makes the change riskier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okey
I think that doing this change changse the external files from the XXHASH code and making future upgrades more tricky. But this is frequently done in this code base anyways and not updating often like with the very old secp256k1 lib so maybe its okey for this also!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder whether using xxhash
makes sense. For a lot of what's being hashed, the data is already effectively random (SHA-256 hashes and the like) so using a great hash to hash it again seems a little pointless, especially if we're having to introduce a dependency (that, as noted, isn't really kept up to date).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Nik that hashing something that's already a SHA-256 hash doesn't add value. We could look to see if we could remove XXHASH. I'll make a issue.
Edit: Issue is here: #4547
#include <boost/align.hpp> | ||
#include <boost/container/static_vector.hpp> | ||
#include <boost/predef.h> | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the predef ok to remove from here because the code uses it for BOOST_OS_LINUX on next ligne. Is it maybe better to remove the linux special code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These includes are still here, I just moved them earlier in the file. (We usually put boost includes should come before system includes). This isn't needed for this patch, but I thought I'd clean that up while I was edit this file.
As for removing the linux specific code: It looks like Nik is ambivalent about the hint, and it's possible he may reconsider that, but I don't think we should touch that in this patch.
@@ -76,7 +78,9 @@ class SlabAllocator | |||
|
|||
while (data + item <= p_ + size_) | |||
{ | |||
*reinterpret_cast<std::uint8_t**>(data) = l_; | |||
// Use memcpy to avoid unaligned UB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe use the std:: prefix for memcpy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in a553a13 [fold] Replace memcpy with std::memcpy
Edit: I also confirmed the compiler optimized std::memcpy
the same way it did memcpy
(ref: https://godbolt.org/z/4E1YT4Gs6)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Thanks for doing this. The added memcpy()
s all look right to me. And I especially appreciate the comments indicating why the memcpy()
s are there. Additionally, code coverage is excellent on all of the changed lines.
Have you had a chance to run UBSan on the modified code? It would be great if these changes fix the reports. But even if it doesn't fix the UBSan reports I think these are all good changes.
@scottschurr Yes, I have run the sanitizer, this does not fix all the issues. The remaining issues are either in test code or part of nubd. I would like to get this down to zero, but this is a good first step.
|
@seelabs, sorry, I wasn't clear. I didn't expect these changes to fix all the issues. I just wanted to confirm that the |
@@ -260,7 +260,13 @@ FORCE_INLINE U64 | |||
XXH_readLE64_align(const void* ptr, XXH_endianess endian, XXH_alignment align) | |||
{ | |||
if (align == XXH_unaligned) | |||
return endian == XXH_littleEndian ? A64(ptr) : XXH_swap64(A64(ptr)); | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder whether using xxhash
makes sense. For a lot of what's being hashed, the data is already effectively random (SHA-256 hashes and the like) so using a great hash to hash it again seems a little pointless, especially if we're having to introduce a dependency (that, as noted, isn't really kept up to date).
@@ -76,7 +78,9 @@ class SlabAllocator | |||
|
|||
while (data + item <= p_ + size_) | |||
{ | |||
*reinterpret_cast<std::uint8_t**>(data) = l_; | |||
// Use memcpy to avoid unaligned UB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine.
I'll default to holding this until 1.12 (expected ~Sept 2023) unless anyone comments that they would like to see it included in 1.11 (expected June 2023). |
Unaligned load and stores are supported by both intel and arm CPUs, however, this is UB in C++. Replacing this with a `memcpy` fixes the undefined behavior and the compiled assembly code is equivalent to the original (so there is no penalty to using memcpy).
Squahsed and forced pushed. |
@intelliot I'm fine holding this until 1.12 |
This was discussed today and it is OK to include in 1.11. I intend to merge it shortly. |
Unaligned load and stores are supported by both intel and arm CPUs, however, this is UB in C++. Replacing this with a
memcpy
fixes the undefined behavior and the compiled assembly code is equivalent to the original (so there is no penalty to using memcpy).Fix #4528