-
Notifications
You must be signed in to change notification settings - Fork 12.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[libc++][test][msan] Fix bots after #67799 #73152
[libc++][test][msan] Fix bots after #67799 #73152
Conversation
Created using spr 1.3.4
@llvm/pr-subscribers-libcxx Author: Vitaly Buka (vitalybuka) ChangesTests hangs on Full diff: https://github.com/llvm/llvm-project/pull/73152.diff 6 Files Affected:
diff --git a/libcxx/test/std/atomics/atomics.types.generic/atomics.types.float/exchange.pass.cpp b/libcxx/test/std/atomics/atomics.types.generic/atomics.types.float/exchange.pass.cpp
index 4bd50396213d8b6..87ce7ab05d92e6c 100644
--- a/libcxx/test/std/atomics/atomics.types.generic/atomics.types.float/exchange.pass.cpp
+++ b/libcxx/test/std/atomics/atomics.types.generic/atomics.types.float/exchange.pass.cpp
@@ -8,7 +8,9 @@
// UNSUPPORTED: c++03, c++11, c++14, c++17
// UNSUPPORTED: target={{.+}}-windows-gnu
// Clang's support for atomic operations on long double is broken. See https://github.com/llvm/llvm-project/issues/72893
-// XFAIL: tsan, msan
+// XFAIL: tsan
+// Hangs with msan.
+// UNSUPPORTED: msan
// ADDITIONAL_COMPILE_FLAGS(has-latomic): -latomic
// T exchange(T, memory_order = memory_order::seq_cst) volatile noexcept;
diff --git a/libcxx/test/std/atomics/atomics.types.generic/atomics.types.float/fetch_add.pass.cpp b/libcxx/test/std/atomics/atomics.types.generic/atomics.types.float/fetch_add.pass.cpp
index bb0b7c5b5847290..0c59b50eb807051 100644
--- a/libcxx/test/std/atomics/atomics.types.generic/atomics.types.float/fetch_add.pass.cpp
+++ b/libcxx/test/std/atomics/atomics.types.generic/atomics.types.float/fetch_add.pass.cpp
@@ -10,6 +10,8 @@
// XFAIL: LIBCXX-AIX-FIXME
// Clang's support for atomic operations on long double is broken. See https://github.com/llvm/llvm-project/issues/72893
// XFAIL: tsan
+// Hangs with msan.
+// UNSUPPORTED: msan
// ADDITIONAL_COMPILE_FLAGS(has-latomic): -latomic
// floating-point-type fetch_add(floating-point-type,
diff --git a/libcxx/test/std/atomics/atomics.types.generic/atomics.types.float/fetch_sub.pass.cpp b/libcxx/test/std/atomics/atomics.types.generic/atomics.types.float/fetch_sub.pass.cpp
index b847cedf404a1c4..4435e3fc19b4e90 100644
--- a/libcxx/test/std/atomics/atomics.types.generic/atomics.types.float/fetch_sub.pass.cpp
+++ b/libcxx/test/std/atomics/atomics.types.generic/atomics.types.float/fetch_sub.pass.cpp
@@ -10,6 +10,8 @@
// XFAIL: LIBCXX-AIX-FIXME
// Clang's support for atomic operations on long double is broken. See https://github.com/llvm/llvm-project/issues/72893
// XFAIL: tsan
+// Hangs with msan.
+// UNSUPPORTED: msan
// ADDITIONAL_COMPILE_FLAGS(has-latomic): -latomic
// floating-point-type fetch_sub(floating-point-type,
diff --git a/libcxx/test/std/atomics/atomics.types.generic/atomics.types.float/operator.minus_equals.pass.cpp b/libcxx/test/std/atomics/atomics.types.generic/atomics.types.float/operator.minus_equals.pass.cpp
index 18c42d7b9bb13cb..b496f2f272c6ac8 100644
--- a/libcxx/test/std/atomics/atomics.types.generic/atomics.types.float/operator.minus_equals.pass.cpp
+++ b/libcxx/test/std/atomics/atomics.types.generic/atomics.types.float/operator.minus_equals.pass.cpp
@@ -9,6 +9,9 @@
// UNSUPPORTED: target={{.+}}-windows-gnu
// XFAIL: LIBCXX-AIX-FIXME
// ADDITIONAL_COMPILE_FLAGS(has-latomic): -latomic
+// Clang's support for atomic operations on long double is broken. See https://github.com/llvm/llvm-project/issues/72893
+// Hangs with msan.
+// UNSUPPORTED: msan
// floating-point-type operator-=(floating-point-type) volatile noexcept;
// floating-point-type operator-=(floating-point-type) noexcept;
diff --git a/libcxx/test/std/atomics/atomics.types.generic/atomics.types.float/operator.plus_equals.pass.cpp b/libcxx/test/std/atomics/atomics.types.generic/atomics.types.float/operator.plus_equals.pass.cpp
index 6026d2392d9377e..299e300c897e56d 100644
--- a/libcxx/test/std/atomics/atomics.types.generic/atomics.types.float/operator.plus_equals.pass.cpp
+++ b/libcxx/test/std/atomics/atomics.types.generic/atomics.types.float/operator.plus_equals.pass.cpp
@@ -9,6 +9,9 @@
// UNSUPPORTED: target={{.+}}-windows-gnu
// XFAIL: LIBCXX-AIX-FIXME
// ADDITIONAL_COMPILE_FLAGS(has-latomic): -latomic
+// Clang's support for atomic operations on long double is broken. See https://github.com/llvm/llvm-project/issues/72893
+// Hangs with msan.
+// UNSUPPORTED: msan
// floating-point-type operator+=(floating-point-type) volatile noexcept;
// floating-point-type operator+=(floating-point-type) noexcept;
diff --git a/libcxx/test/std/atomics/atomics.types.generic/atomics.types.float/wait.pass.cpp b/libcxx/test/std/atomics/atomics.types.generic/atomics.types.float/wait.pass.cpp
index ce730b792a21e6a..e3a8c576ef0167e 100644
--- a/libcxx/test/std/atomics/atomics.types.generic/atomics.types.float/wait.pass.cpp
+++ b/libcxx/test/std/atomics/atomics.types.generic/atomics.types.float/wait.pass.cpp
@@ -8,7 +8,9 @@
// UNSUPPORTED: c++03, c++11, c++14, c++17
// XFAIL: availability-synchronization_library-missing
// Clang's support for atomic operations on long double is broken. See https://github.com/llvm/llvm-project/issues/72893
-// XFAIL: tsan, msan
+// XFAIL: tsan
+// Hangs with msan.
+// UNSUPPORTED: msan
// ADDITIONAL_COMPILE_FLAGS(has-latomic): -latomic
// void wait(T old, memory_order order = memory_order::seq_cst) const volatile noexcept;
|
I'm a bit confused here. We have our own pre-commit sanitizer builds and they seemed to be okay with that change? Is it possible that your sanitizer bots are just less beefy than ours, so msan hangs on yours but not on ours? |
I don't know, slightly different environment? They way off from other tests:
|
Looks like out bot is about 2x faster: From https://github.com/llvm/llvm-project/actions/runs/6933781778/job/18867567142
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
The concern I have with disabling these tests as done in this patch is that it'll disable it for our CI as well, where these tests are running fine. And there's a lot of value in having these tests run under msan and tsan due to their nature. I can see that the tests are clearly super slow on your bot, I am just not certain what's the best way to address this without removing a lot of coverage for everyone else. Unrelated question: what additional coverage do your bots provide on top of the testing we already do with sanitizers? As a general guideline, libc++ doesn't support CI setups outside of our own pre-commit CI, because it leads to these kinds of difficulties where we don't control the setup. |
Our msan bot is using |
I really think for project like this there is more value in testing on more different environments than keeping a particular test alive, as you will get feedback from other platforms yearly. I myself rely appreciate when someone runs our sanitizers on bots with some platforms I have no access. Even I have no control over them, it's much easy resolve regressions soon after commit than after it gets into release. About coverage:
Regarding particular test, it claims clang is broken and XFAIL most of them any way. It's just XFAIL on timeout is not nice. |
...xx/test/std/atomics/atomics.types.generic/atomics.types.float/operator.minus_equals.pass.cpp
Outdated
Show resolved
Hide resolved
libcxx/test/std/atomics/atomics.types.generic/atomics.types.float/operator.plus_equals.pass.cpp
Outdated
Show resolved
Hide resolved
// XFAIL: tsan, msan | ||
// XFAIL: tsan | ||
// Hangs with msan. | ||
// UNSUPPORTED: msan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you know why it is hanging on your CI? It was fine on our msan CI. It XFAIL because long double
of x87_fp80 format has padding bits and at the moment, there is no way for clang to initialise those padding bits
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not yet, now I want recover bots and investigate later, if there is an interest in support of these test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is x87_fp80 about msan? Msan, I guess, can just not take them into account?
I didn't look into details of float atomics, but if the type is visible to msan, it should be possible.
Any idea for tsan?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes it is about msan .
TSAN is because of the GitHub issue in the comments. basically according to the discussion it is a problem in TSAN itself where it does not intercept any calls into libatomic
As I said half of them already XFAIL I can sweep every new failing test under the rug https://github.com/llvm/llvm-zorg/blob/main/zorg/buildbot/builders/sanitizers/buildbot_functions.sh#L356 |
...xx/test/std/atomics/atomics.types.generic/atomics.types.float/operator.minus_equals.pass.cpp
Outdated
Show resolved
Hide resolved
libcxx/test/std/atomics/atomics.types.generic/atomics.types.float/operator.plus_equals.pass.cpp
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with this to get your bots back to green, but @vitalybuka I'd really like you to try and dig into why these are hanging on msan in your setup (which we can't do ourselves cause it doesn't behave that way for us).
Technically you can, the bot is on public GCE and the script which sets up our bot is published But I totally understand that this is a addition burden for you, and will do some investigation and get back with results. |
maybe msan on aarch64 is bit different? |
4 of UNSUPPORTED tests are there because of x86_64 Ubuntu build. |
Exactly same build does not reproduce on my workstation, with Linux distribution which is from the bot. |
This makes me think the timeout simply is because the test is a bit slow when compiling with msan? there is a loop around the whole test point (to increase the likelyhood to fail in case a buggy source code). I wander reduce the loop count could help here |
Is the hanging on x86_64 or aarch64? |
Both, different tests |
I attached debugger and it infinitely spinning
|
|
@huixie90 This smells awfully like the padding bytes issue for |
Buildbot is:
Workstation:
I copied compiled test binary and libc++ so files from the bot to the workstation and it does not hang. |
Note that |
Also I replaced 13.1.0-2ubuntu2~23.04 with 13.2.0-4 and now all tests pass on the bot. |
Ubuntu 23.10 with libatomic 13.2.0-4 also hangs. |
Yea the stacktrace clearly shows that it is long double. Yes we did have a big report from 2020 as we seen in the code review. the things I don’t understand is could we clarify in this build:
In your debugger, is that is_fp80_long_double return true? |
As I wrote, I can reproduce exactly the same build only on Ubuntu, but no Debian
Lets ignore that one, as it fails on the different tests. Maybe a different issue.
yes, clang is build from the same revision as your patch
No
Yes I suspect you can reproduce on ubuntu of the same version using Docker or similar. |
aarch64 is actually good both test are unexpectedly passed because there is not fp80, but they where marked as XFAIL
|
So the problem is specifically to
The loop that it got stucks in is a CAS loop where there is an existing problem for fp80 where compare-exchange always return false because of the padding. Inside the loop I have a workaround to copy the memory contents to “old” on a failing compare-exchange (and there is another exiting bug here that it does not seem to update the old for fp80) . Note normal assignment would not work because it does not assign the padding bits. I suspect MSAN is intercepting my workaround here. And yes we should really properly fix the issue rather than applying more and more workaround but we need the compiler side to help us. |
Undo a part of #73152. These test do not hang, but unexpectedlty pass on aarch64 https://lab.llvm.org/buildbot/#/builders/74/builds/23708 Tests work on aarch64 and probably any platform without fp80.
Tests hangs on
https://lab.llvm.org/buildbot/#/builders/sanitizer-x86_64-linux-bootstrap-msan
https://lab.llvm.org/buildbot/#/builders/sanitizer-aarch64-linux-bootstrap-msan