-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
yescrypt test fails on Linux/x86_64 systems configured with 1 GiB huge pages #152
Comments
Can you obtain a backtrace if you run it manually? |
How do I run it manually? |
@alexforencich Can you try without Can you share more detail about the system - compiler, etc. Also is this in a VM, and if so with how much RAM? |
At least this one looks like an out-of-memory condition in a tiny VM to me. |
I will re-run without that flag. This is not in a VM, this is bare metal on a 64 core AMD EPYC Zen 2 with 256 GB of RAM. So it's definitely not running out of RAM. |
Seems like removing that flag made no difference to the test suite output:
|
Some system information:
Build works just fine on a machine with an Intel CPU, with the same OS and compiler versions. |
Thank you, @alexforencich. This is puzzling because the difference between these two is supposed to be in memory usage (16 MiB vs. 1 GiB):
Further, it explicitly knows it "FAILED" - it does not simply return a miscomputed value. This is why I thought of memory allocation error as the most likely cause. Even further, the value it returned for the 16 MiB case is the correct one. If not total RAM, then maybe e.g. a low |
Oh, here's another idea - maybe something weird goes on with allocation of huge pages. We have this in
Obviously, one of these values is below and the other above this threshold. Can you try playing with this threshold to see if it makes a difference - e.g., replace the |
Aha, possibly something strange is going on with hugepages. On the machines in question, hugepages are configured like so:
So if the tests are trying to allocate too many huge pages, then something will fail. I'll make that change and see what happens when re-running the tests. |
The code assumes that there are 2M-sized huge pages, and maybe you only have 1G-sized instead? The code aligns the size for 2M, not for 1G, so the allocation will likely fail for that reason. However, it should then retry without huge pages at all. So it's still puzzling why it fails visibly. |
Tests pass with that change. Definitely looks like there is something wrong with how hugepages are handled here; it looks like the code is expecting hugepages to be 2 MB, but on those machines hugepages are 1 GB. |
On the problematic machine:
On a machine that works:
|
As I understand, Zen 2 (and thus EPYC) does support 2 MB pages as well - so it's a matter of kernel configurations. Regardless, I don't see why the expected fallback to non-huge pages apparently also failed, and why some of the |
After making that change, all of the tests passed. Yes, the machines probably support 2 MB hugepages, but as the machines are used for networking applications with DPDK, the hugepage size is set to 1 GB. But an encryption library should not be dependent on the kernel configuration in this way. Also, I did a bit of poking around, and the alg-yescrypt test binary reports a segfault at the end of the test. I'm assuming this is probably a result of attempting to free a buffer that wasn't actually allocated in the first place or something along those lines, but I'll see if I can get a stack trace. |
Oh, I think I understand - they were followed by So we just need to figure out what happened with the fallback, which is here: base = mmap(NULL, new_size, PROT_READ | PROT_WRITE, flags, -1, 0);
if (base != MAP_FAILED) {
base_size = new_size;
} else if (flags & MAP_HUGETLB) {
flags &= ~MAP_HUGETLB;
base = mmap(NULL, size, PROT_READ | PROT_WRITE, flags, -1, 0);
} The expectation is that we'd get |
What file is that in? I can add some prints and run it again. |
This is in |
Segfault backtrace:
|
The segfault is because the test (not the code being tested) is careless - it does not check for uint8_t *hash = yescrypt(
(const uint8_t *)"pleaseletmein", setting);
printf("Plaintext: '%s'\n", (char *)hash);
hash = (uint8_t *)strdup((char *)hash); We could fix this or not. The real important fix should be to the actual code being tested, to make the fallback work. |
So I instrumented the code like so:
And I am getting:
So it looks like mmap succeeded, but it failed somewhere else? |
Looks so! Meanwhile, I've just tested the fallback by making the first |
@alexforencich Are you sure you reverted the threshold adjustment first - that is, that |
I reverted the adjustment, and if you look at the pointer LSBs, it definitely looks like it allocated a 1GB huge page:
|
Oh, right. I assume you can also reproduce the same by:
|
yep, that one seems to report the same failure:
|
I cannot reproduce the problem easily, so would appreciate it if you figure it out on this smaller codebase. It should be rather straightforward to see where the code detects it's failing for you. |
Would remote access to one of the machines in question be helpful? |
Alright, here's something strange: I just did some testing of that yescrypt code on two other machines (with Xeon CPUs), otherwise identical to each other except for the hugepages configuration (one is default, one has 16x 1G hugepages), and I'm not seeing the same failure there. However, the test case fails somewhere else. But these machines are running Ubuntu 18.04 instead of Arch. |
Here's my current best guess: the failure is on Maybe this is related to the size we're unmapping not exactly matching the size the kernel actually maps. Specifically, we call @alexforencich You should be able to easily confirm this with If confirmed, a question would be whether this is a kernel bug. Arguably, if it's going to disallow Another question would be what to do about this, if at all. Share a reduced test case with Linux kernel maintainers? I don't currently have a good idea for an efficient and safe workaround we could include. Setting our
If my guesses above are all correct, then I don't currently see what else I'd test - short of debugging kernel internals. Otherwise, yes, I'd see where else it fails - although I don't currently see any other realistic possibility.
Can you show where it fails on Intel? I assume also only on the machine with 1G pages? |
Running the alg-yescrypt test with strace results in:
I have not tried changing the hugepage size yet. But yes, this certainly could be a kernel bug, and that would explain why I'm not seeing the same issue on the two Intel servers, despite one of them being configured for a huge page size of 1 GB, since they're running a significantly older kernel. The two Intel machines seem to be returning the same test results:
Seems like an unrelated test is failing for some reason; it is reported as failing on both machines even though only one as hugepages set to 1G. Additionally, there seems to be an issue with one of the test scripts finding 'time'. This is what I get on the AMD machine:
And strace shows:
|
I think that's actually why it's failing - can you try installing |
That's what's odd, it seems to be installed:
|
Hmm, there is something strange going on here as well. If I run the PHC test manually, it works:
But, if I run the other set of tests manually, it fails on the 1G hugepage machine, but fails on the other one. First machine, with 1G huge pages:
Second machine, default config:
Looks like I misread the diff output as the test output, and the test was simply not run as time was not found, for some reason. So, whatever is going on, it's still a problem under kernel version 4.15.0. |
And under strace, we can compare what happens. Defaults:
1G huge pages:
|
Hmm, seems like this is a documented "feature" https://man7.org/linux/man-pages/man2/mmap.2.html For mmap, length is internally rounded up to the huge page size, but for munmap, this does not happen and the length needs to be correct. So, what's the proper solution? Seems like mmap supports MAP_HUGE_2MB and MAP_HUGE_1GB, so I think using this may be the "best" solution here...if you're going to assume a 2 MB huge page size, just force mmap to use only 2 MB huge pages, otherwise fail. I did a quick test adding MAP_HUGE_2MB alongside MAP_HUGETLB, and the tests passed as the first call to mmap failed due to no 2 MB huge pages being available (FYI, I also needed to include linux/mman.h for MAP_HUGE_2MB). But, another consideration may be to drop explicit support for huge pages altogether and instead use transparent hugepages by using madvise with MADV_HUGEPAGE. See https://www.kernel.org/doc/Documentation/vm/transhuge.txt . |
Regarding Thank you for figuring out that the |
Trying On the same old system, |
Oh, that would make sense, I think ubuntu uses Possibly hugepages vs transparent huge pages vs normal pages is worth some benchmarks. Seems like transparent huge pages should be available on most systems since the default setting seems to be |
Would it make sense to disable hugetable support until these issues are sorted out? |
@Mic92 Yes, I'm considering either dropping explicit huge pages support or using |
So I was thinking that maybe on recent kernels explicit huge page support was no longer helpful - but no, on Fedora 36 with a 5.18'ish kernel I am getting +4% speed at one hash computation, +3% at many concurrent computations for 1 GiB per hash allocation (useful for the KDF use case of upstream yescrypt, and not so much for libxcrypt's password hashing use case). To achieve these speedups, I have to pre-allocate huge pages with the Anyway, the below patch retains those speedups in that special case while hopefully fixing the issue reported here: +++ b/yescrypt-platform.c
@@ -21,6 +21,9 @@
#ifdef __unix__
#include <sys/mman.h>
#endif
+#ifdef __linux__
+#include <linux/mman.h> /* for MAP_HUGE_2MB */
+#endif
#define HUGEPAGE_THRESHOLD (32 * 1024 * 1024)
@@ -40,11 +43,11 @@ static void *alloc_region(yescrypt_region_t *region, size_t size)
MAP_NOCORE |
#endif
MAP_ANON | MAP_PRIVATE;
-#if defined(MAP_HUGETLB) && defined(HUGEPAGE_SIZE)
+#if defined(MAP_HUGETLB) && defined(MAP_HUGE_2MB) && defined(HUGEPAGE_SIZE)
size_t new_size = size;
const size_t hugepage_mask = (size_t)HUGEPAGE_SIZE - 1;
if (size >= HUGEPAGE_THRESHOLD && size + hugepage_mask >= size) {
- flags |= MAP_HUGETLB;
+ flags |= MAP_HUGETLB | MAP_HUGE_2MB;
/*
* Linux's munmap() fails on MAP_HUGETLB mappings if size is not a multiple of
* huge page size, so let's round up to huge page size here.
@@ -56,7 +59,7 @@ static void *alloc_region(yescrypt_region_t *region, size_t size)
if (base != MAP_FAILED) {
base_size = new_size;
} else if (flags & MAP_HUGETLB) {
- flags &= ~MAP_HUGETLB;
+ flags &= ~(MAP_HUGETLB | MAP_HUGE_2MB);
base = mmap(NULL, size, PROT_READ | PROT_WRITE, flags, -1, 0);
}
Its downside is the minimum kernel version for explicit usage of huge pages is increased from 2.6.32 (RHEL6) to 3.8 (RHEL7). IIRC, I had previously found them making most performance difference on RHEL6 (or its rebuild distros), so that's kind of a pity, but then that distro is EOL'ed. I think I'll get the above patch into upstream yescrypt, and libxcrypt can do the same to minimize the differences. |
There's also a 1% speedup at 16 MiB (which we currently typically use for password hashing with libxcrypt), but the threshold is set at 32 MiB. Initially (prior to yescrypt 1.0 release), I had the threshold at 12 MiB (based on testing back then). The increase to 32 MiB was to better accommodate concurrent RAM+ROM usage, where having only the ROM on huge pages (typically SysV shm) resulted in slight speedup on some systems compared to keeping both RAM and ROM on huge pages - I guess due to use of different TLBs vs. competition for the one same huge page TLB. yespower - a related PoW scheme, which doesn't have the ROM - has the threshold at 12 MiB. Since libxcrypt does not expose yescrypt's ROM support, it could also lower the threshold to 12 MiB once we fix the issue here. A drawback is that it'd actually consume almost 2 MiB of extra RAM (but only out of pre-allocated huge pages, so OK?) since the typical allocation is just slightly higher than 16 MiB (so it'd end up with 18 MiB). |
Thanks. I just applied the patch and it works for both 1GiB and 2MiB pages on a system that used to fail before. |
Thanks for testing. I've just pushed this fix to upstream yescrypt: openwall/yescrypt@1554b08 |
Could we get a patch release for nixpkgs? NixOS/nixpkgs#201653 |
cc @besser82 ^ |
When building this package on an AMD EPYC Zen 2 CPU, the tests fail, including one reported segfault.
Configured with
Test suite run output:
test-suite.log:
The text was updated successfully, but these errors were encountered: