Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[libc++][test][msan] Fix bots after #67799 #73152

Merged
merged 2 commits into from
Nov 22, 2023

Conversation

Created using spr 1.3.4
@vitalybuka vitalybuka requested a review from a team as a code owner November 22, 2023 17:58
@llvmbot llvmbot added the libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. label Nov 22, 2023
@llvmbot
Copy link
Member

llvmbot commented Nov 22, 2023

@llvm/pr-subscribers-libcxx

Author: Vitaly Buka (vitalybuka)

Changes

Tests hangs on
https://lab.llvm.org/buildbot/#/builders/sanitizer-x86_64-linux-bootstrap-msan
https://lab.llvm.org/buildbot/#/builders/sanitizer-aarch64-linux-bootstrap-msan


Full diff: https://github.com/llvm/llvm-project/pull/73152.diff

6 Files Affected:

  • (modified) libcxx/test/std/atomics/atomics.types.generic/atomics.types.float/exchange.pass.cpp (+3-1)
  • (modified) libcxx/test/std/atomics/atomics.types.generic/atomics.types.float/fetch_add.pass.cpp (+2)
  • (modified) libcxx/test/std/atomics/atomics.types.generic/atomics.types.float/fetch_sub.pass.cpp (+2)
  • (modified) libcxx/test/std/atomics/atomics.types.generic/atomics.types.float/operator.minus_equals.pass.cpp (+3)
  • (modified) libcxx/test/std/atomics/atomics.types.generic/atomics.types.float/operator.plus_equals.pass.cpp (+3)
  • (modified) libcxx/test/std/atomics/atomics.types.generic/atomics.types.float/wait.pass.cpp (+3-1)
diff --git a/libcxx/test/std/atomics/atomics.types.generic/atomics.types.float/exchange.pass.cpp b/libcxx/test/std/atomics/atomics.types.generic/atomics.types.float/exchange.pass.cpp
index 4bd50396213d8b6..87ce7ab05d92e6c 100644
--- a/libcxx/test/std/atomics/atomics.types.generic/atomics.types.float/exchange.pass.cpp
+++ b/libcxx/test/std/atomics/atomics.types.generic/atomics.types.float/exchange.pass.cpp
@@ -8,7 +8,9 @@
 // UNSUPPORTED: c++03, c++11, c++14, c++17
 // UNSUPPORTED: target={{.+}}-windows-gnu
 // Clang's support for atomic operations on long double is broken. See https://github.com/llvm/llvm-project/issues/72893
-// XFAIL: tsan, msan
+// XFAIL: tsan
+// Hangs with msan.
+// UNSUPPORTED: msan
 // ADDITIONAL_COMPILE_FLAGS(has-latomic): -latomic
 
 //  T exchange(T, memory_order = memory_order::seq_cst) volatile noexcept;
diff --git a/libcxx/test/std/atomics/atomics.types.generic/atomics.types.float/fetch_add.pass.cpp b/libcxx/test/std/atomics/atomics.types.generic/atomics.types.float/fetch_add.pass.cpp
index bb0b7c5b5847290..0c59b50eb807051 100644
--- a/libcxx/test/std/atomics/atomics.types.generic/atomics.types.float/fetch_add.pass.cpp
+++ b/libcxx/test/std/atomics/atomics.types.generic/atomics.types.float/fetch_add.pass.cpp
@@ -10,6 +10,8 @@
 // XFAIL: LIBCXX-AIX-FIXME
 // Clang's support for atomic operations on long double is broken. See https://github.com/llvm/llvm-project/issues/72893
 // XFAIL: tsan
+// Hangs with msan.
+// UNSUPPORTED: msan
 // ADDITIONAL_COMPILE_FLAGS(has-latomic): -latomic
 
 // floating-point-type fetch_add(floating-point-type,
diff --git a/libcxx/test/std/atomics/atomics.types.generic/atomics.types.float/fetch_sub.pass.cpp b/libcxx/test/std/atomics/atomics.types.generic/atomics.types.float/fetch_sub.pass.cpp
index b847cedf404a1c4..4435e3fc19b4e90 100644
--- a/libcxx/test/std/atomics/atomics.types.generic/atomics.types.float/fetch_sub.pass.cpp
+++ b/libcxx/test/std/atomics/atomics.types.generic/atomics.types.float/fetch_sub.pass.cpp
@@ -10,6 +10,8 @@
 // XFAIL: LIBCXX-AIX-FIXME
 // Clang's support for atomic operations on long double is broken. See https://github.com/llvm/llvm-project/issues/72893
 // XFAIL: tsan
+// Hangs with msan.
+// UNSUPPORTED: msan
 // ADDITIONAL_COMPILE_FLAGS(has-latomic): -latomic
 
 // floating-point-type fetch_sub(floating-point-type,
diff --git a/libcxx/test/std/atomics/atomics.types.generic/atomics.types.float/operator.minus_equals.pass.cpp b/libcxx/test/std/atomics/atomics.types.generic/atomics.types.float/operator.minus_equals.pass.cpp
index 18c42d7b9bb13cb..b496f2f272c6ac8 100644
--- a/libcxx/test/std/atomics/atomics.types.generic/atomics.types.float/operator.minus_equals.pass.cpp
+++ b/libcxx/test/std/atomics/atomics.types.generic/atomics.types.float/operator.minus_equals.pass.cpp
@@ -9,6 +9,9 @@
 // UNSUPPORTED: target={{.+}}-windows-gnu
 // XFAIL: LIBCXX-AIX-FIXME
 // ADDITIONAL_COMPILE_FLAGS(has-latomic): -latomic
+// Clang's support for atomic operations on long double is broken. See https://github.com/llvm/llvm-project/issues/72893
+// Hangs with msan.
+// UNSUPPORTED: msan
 
 // floating-point-type operator-=(floating-point-type) volatile noexcept;
 // floating-point-type operator-=(floating-point-type) noexcept;
diff --git a/libcxx/test/std/atomics/atomics.types.generic/atomics.types.float/operator.plus_equals.pass.cpp b/libcxx/test/std/atomics/atomics.types.generic/atomics.types.float/operator.plus_equals.pass.cpp
index 6026d2392d9377e..299e300c897e56d 100644
--- a/libcxx/test/std/atomics/atomics.types.generic/atomics.types.float/operator.plus_equals.pass.cpp
+++ b/libcxx/test/std/atomics/atomics.types.generic/atomics.types.float/operator.plus_equals.pass.cpp
@@ -9,6 +9,9 @@
 // UNSUPPORTED: target={{.+}}-windows-gnu
 // XFAIL: LIBCXX-AIX-FIXME
 // ADDITIONAL_COMPILE_FLAGS(has-latomic): -latomic
+// Clang's support for atomic operations on long double is broken. See https://github.com/llvm/llvm-project/issues/72893
+// Hangs with msan.
+// UNSUPPORTED: msan
 
 // floating-point-type operator+=(floating-point-type) volatile noexcept;
 // floating-point-type operator+=(floating-point-type) noexcept;
diff --git a/libcxx/test/std/atomics/atomics.types.generic/atomics.types.float/wait.pass.cpp b/libcxx/test/std/atomics/atomics.types.generic/atomics.types.float/wait.pass.cpp
index ce730b792a21e6a..e3a8c576ef0167e 100644
--- a/libcxx/test/std/atomics/atomics.types.generic/atomics.types.float/wait.pass.cpp
+++ b/libcxx/test/std/atomics/atomics.types.generic/atomics.types.float/wait.pass.cpp
@@ -8,7 +8,9 @@
 // UNSUPPORTED: c++03, c++11, c++14, c++17
 // XFAIL: availability-synchronization_library-missing
 // Clang's support for atomic operations on long double is broken. See https://github.com/llvm/llvm-project/issues/72893
-// XFAIL: tsan, msan
+// XFAIL: tsan
+// Hangs with msan.
+// UNSUPPORTED: msan
 // ADDITIONAL_COMPILE_FLAGS(has-latomic): -latomic
 
 // void wait(T old, memory_order order = memory_order::seq_cst) const volatile noexcept;

@ldionne
Copy link
Member

ldionne commented Nov 22, 2023

I'm a bit confused here. We have our own pre-commit sanitizer builds and they seemed to be okay with that change? Is it possible that your sanitizer bots are just less beefy than ours, so msan hangs on yours but not on ours?

@vitalybuka
Copy link
Collaborator Author

vitalybuka commented Nov 22, 2023

I'm a bit confused here. We have our own pre-commit sanitizer builds and they seemed to be okay with that change? Is it possible that your sanitizer bots are just less beefy than ours, so msan hangs on yours but not on ours?

I don't know, slightly different environment?

They way off from other tests:

1500.03s: llvm-libc++-shared.cfg.in :: std/atomics/atomics.types.generic/atomics.types.float/operator.minus_equals.pass.cpp
1500.03s: llvm-libc++-shared.cfg.in :: std/atomics/atomics.types.generic/atomics.types.float/fetch_add.pass.cpp
1500.03s: llvm-libc++-shared.cfg.in :: std/atomics/atomics.types.generic/atomics.types.float/fetch_sub.pass.cpp
1500.02s: llvm-libc++-shared.cfg.in :: std/atomics/atomics.types.generic/atomics.types.float/operator.plus_equals.pass.cpp
426.52s: llvm-libc++-shared.cfg.in :: std/algorithms/alg.sorting/alg.sort/sort/sort.pass.cpp
416.55s: llvm-libc++-shared.cfg.in :: std/numerics/rand/rand.dist/rand.dist.bern/rand.dist.bern.geo/eval.pass.cpp
394.60s: llvm-libc++-shared.cfg.in :: std/containers/sequences/deque/deque.modifiers/insert_size_value.pass.cpp
288.27s: llvm-libc++-shared.cfg.in :: std/experimental/simd/simd.reference/reference_assignment.pass.cpp

@vitalybuka
Copy link
Collaborator Author

Looks like out bot is about 2x faster:

From https://github.com/llvm/llvm-project/actions/runs/6933781778/job/18867567142

********************
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90.. 
Slowest Tests:
--------------------------------------------------------------------------
1500.02s: llvm-libc++-shared.cfg.in :: std/numerics/rand/rand.dist/rand.dist.bern/rand.dist.bern.negbin/eval.pass.cpp
1500.01s: llvm-libc++-shared.cfg.in :: std/containers/sequences/deque/deque.modifiers/insert_iter_iter.pass.cpp
1290.75s: llvm-libc++-shared.cfg.in :: std/numerics/rand/rand.dist/rand.dist.samp/rand.dist.samp.discrete/eval.pass.cpp
1081.18s: llvm-libc++-shared.cfg.in :: std/numerics/rand/rand.dist/rand.dist.bern/rand.dist.bern.geo/eval.pass.cpp
1030.19s: llvm-libc++-shared.cfg.in :: std/algorithms/alg.sorting/alg.sort/sort/sort.pass.cpp

Copy link
Member

@mordante mordante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@ldionne
Copy link
Member

ldionne commented Nov 22, 2023

The concern I have with disabling these tests as done in this patch is that it'll disable it for our CI as well, where these tests are running fine. And there's a lot of value in having these tests run under msan and tsan due to their nature. I can see that the tests are clearly super slow on your bot, I am just not certain what's the best way to address this without removing a lot of coverage for everyone else.

Unrelated question: what additional coverage do your bots provide on top of the testing we already do with sanitizers? As a general guideline, libc++ doesn't support CI setups outside of our own pre-commit CI, because it leads to these kinds of difficulties where we don't control the setup.

@ldionne
Copy link
Member

ldionne commented Nov 22, 2023

Our msan bot is using MemoryWithOrigins, not just Memory. @vitalybuka do you know if that could be the cause of the large difference we're seeing here?

@vitalybuka
Copy link
Collaborator Author

vitalybuka commented Nov 22, 2023

The concern I have with disabling these tests as done in this patch is that it'll disable it for our CI as well, where these tests are running fine. And there's a lot of value in having these tests run under msan and tsan due to their nature. I can see that the tests are clearly super slow on your bot, I am just not certain what's the best way to address this without removing a lot of coverage for everyone else.

Unrelated question: what additional coverage do your bots provide on top of the testing we already do with sanitizers? As a general guideline, libc++ doesn't support CI setups outside of our own pre-commit CI, because it leads to these kinds of difficulties where we don't control the setup.

I really think for project like this there is more value in testing on more different environments than keeping a particular test alive, as you will get feedback from other platforms yearly.

I myself rely appreciate when someone runs our sanitizers on bots with some platforms I have no access. Even I have no control over them, it's much easy resolve regressions soon after commit than after it gets into release.

About coverage:

  1. we have aarch64 asan/msan/hwasan.
  2. clang is build from the tree one the same revision - we want to know when llvm changes causing sanitizers misbehave
  3. Today our msan runs with default set flags, but usually we opt-in our bots into new features before we enabled them by default.

Regarding particular test, it claims clang is broken and XFAIL most of them any way. It's just XFAIL on timeout is not nice.

// XFAIL: tsan, msan
// XFAIL: tsan
// Hangs with msan.
// UNSUPPORTED: msan
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you know why it is hanging on your CI? It was fine on our msan CI. It XFAIL because long double of x87_fp80 format has padding bits and at the moment, there is no way for clang to initialise those padding bits

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not yet, now I want recover bots and investigate later, if there is an interest in support of these test.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is x87_fp80 about msan? Msan, I guess, can just not take them into account?
I didn't look into details of float atomics, but if the type is visible to msan, it should be possible.

Any idea for tsan?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it is about msan .

TSAN is because of the GitHub issue in the comments. basically according to the discussion it is a problem in TSAN itself where it does not intercept any calls into libatomic

@vitalybuka
Copy link
Collaborator Author

I can see that the tests are clearly super slow on your bot, I am just not certain what's the best way to address this without removing a lot of coverage for everyone else.

As I said half of them already XFAIL

I can sweep every new failing test under the rug https://github.com/llvm/llvm-zorg/blob/main/zorg/buildbot/builders/sanitizers/buildbot_functions.sh#L356
but it will reduce probability of improving support for these usecases.

Copy link
Member

@ldionne ldionne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with this to get your bots back to green, but @vitalybuka I'd really like you to try and dig into why these are hanging on msan in your setup (which we can't do ourselves cause it doesn't behave that way for us).

@ldionne ldionne merged commit 9be9f62 into main Nov 22, 2023
6 of 7 checks passed
@ldionne ldionne deleted the users/vitalybuka/spr/libctestmsan-fix-bots-after-67799 branch November 22, 2023 19:13
@vitalybuka
Copy link
Collaborator Author

vitalybuka commented Nov 22, 2023

I'm fine with this to get your bots back to green, but @vitalybuka I'd really like you to try and dig into why these are hanging on msan in your setup (which we can't do ourselves cause it doesn't behave that way for us).

Technically you can, the bot is on public GCE and the script which sets up our bot is published
How to reproduce locally: https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild

But I totally understand that this is a addition burden for you, and will do some investigation and get back with results.

@huixie90
Copy link
Contributor

The concern I have with disabling these tests as done in this patch is that it'll disable it for our CI as well, where these tests are running fine. And there's a lot of value in having these tests run under msan and tsan due to their nature. I can see that the tests are clearly super slow on your bot, I am just not certain what's the best way to address this without removing a lot of coverage for everyone else.
Unrelated question: what additional coverage do your bots provide on top of the testing we already do with sanitizers? As a general guideline, libc++ doesn't support CI setups outside of our own pre-commit CI, because it leads to these kinds of difficulties where we don't control the setup.

I really think for project like this there is more value in testing on more different environments than keeping a particular test alive, as you will get feedback from other platforms yearly.

I myself rely appreciate when someone runs our sanitizers on bots with some platforms I have no access. Even I have no control over them, it's much easy resolve regressions soon after commit than after it gets into release.

About coverage:

  1. we have aarch64 asan/msan/hwasan.
  2. clang is build from the tree one the same revision - we want to know when llvm changes causing sanitizers misbehave
  3. Today our msan runs with default set flags, but usually we opt-in our bots into new features before we enabled them by default.

Regarding particular test, it claims clang is broken and XFAIL most of them any way. It's just XFAIL on timeout is not nice.

maybe msan on aarch64 is bit different?

@vitalybuka
Copy link
Collaborator Author

maybe msan on aarch64 is bit different?

4 of UNSUPPORTED tests are there because of x86_64 Ubuntu build.

@vitalybuka
Copy link
Collaborator Author

vitalybuka commented Nov 22, 2023

Exactly same build does not reproduce on my workstation, with Linux distribution which is from the bot.

@huixie90
Copy link
Contributor

Exactly same build does not reproduce on my workstation, with Linux distribution which is from the bot.

This makes me think the timeout simply is because the test is a bit slow when compiling with msan?

there is a loop around the whole test point (to increase the likelyhood to fail in case a buggy source code). I wander reduce the loop count could help here

@huixie90
Copy link
Contributor

maybe msan on aarch64 is bit different?

4 of UNSUPPORTED tests are there because of x86_64 Ubuntu build.

Is the hanging on x86_64 or aarch64?

@vitalybuka
Copy link
Collaborator Author

Both, different tests

@vitalybuka
Copy link
Collaborator Author

I attached debugger and it infinitely spinning

while (!__self.compare_exchange_weak(__old, __new, __m, memory_order_relaxed)) {
#  ifdef _LIBCPP_COMPILER_CLANG_BASED
            if constexpr (__is_fp80_long_double()) {
              // https://github.com/llvm/llvm-project/issues/47978
              // clang bug: __old is not updated on failure for atomic<long double>::compare_exchange_weak
              // Note __old = __self.load(memory_order_relaxed) will not work
              std::__cxx_atomic_load_inplace(std::addressof(__self.__a_), &__old, memory_order_relaxed);
            }
#  endif
            __new = __operation(__old, __operand);
          }
          return __old;

@vitalybuka
Copy link
Collaborator Author

0x00007f687662439c in ?? () from /lib/x86_64-linux-gnu/libatomic.so.1
#1  0x00005560625417af in std::__1::__cxx_atomic_compare_exchange_weak[abi:ne180000]<long double>(std::__1::__cxx_atomic_base_impl<long double>*, long double*, long double, std::__1::memory_order, std::__1::memory_order) (__a=0x7fffc4b8c8b0, 
    __expected=0x7fffc4b8c5e0, __value=4.30000000000000004441, __success=std::__1::memory_order::seq_cst, 
    __failure=std::__1::memory_order::relaxed) at /b/my/libcxx_build_msan/include/c++/v1/__atomic/cxx_atomic_impl.h:446
#2  0x0000556062541484 in std::__1::__atomic_base<long double, false>::compare_exchange_weak[abi:ne180000](long double&, long double, std::__1::memory_order, std::__1::memory_order) (this=0x7fffc4b8c8b0, __e=@0x7fffc4b8c5e0: 3.10000000000000008882, 
    __d=4.30000000000000004441, __s=std::__1::memory_order::seq_cst, __f=std::__1::memory_order::relaxed)
    at /b/my/libcxx_build_msan/include/c++/v1/__atomic/atomic_base.h:79
#3  0x00005560625410e7 in std::__1::atomic<long double>::__rmw_op[abi:ne180000]<std::__1::atomic<long double>&, std::__1::plus<void>, std::__1::atomic<long double>::__fetch_add[abi:ne180000]<std::__1::atomic<long double>&>(std::__1::atomic<long double>&, long double, std::__1::memory_order)::{lambda(auto:1, auto:2, auto:3)#1}>(std::__1::atomic<long double>&, long double, std::__1::memory_order, std::__1::plus<void>, std::__1::atomic<long double>::__fetch_add[abi:ne180000]<std::__1::atomic<long double>&>(std::__1::atomic<long double>&, long double, std::__1::memory_order)::{lambda(auto:1, auto:2, auto:3)#1}) (__self=..., 
    __operand=1.19999999999999995559, __m=std::__1::memory_order::seq_cst, __operation=..., __builtin_op=...)
    at /b/my/libcxx_build_msan/include/c++/v1/__atomic/atomic.h:182
#4  0x0000556062540f1a in std::__1::atomic<long double>::__fetch_add[abi:ne180000]<std::__1::atomic<long double>&>(std::__1::atomic<long double>&, long double, std::__1::memory_order) (__self=..., __operand=1.19999999999999995559, 
    __m=std::__1::memory_order::seq_cst) at /b/my/libcxx_build_msan/include/c++/v1/__atomic/atomic.h:202
#5  0x0000556062540df0 in std::__1::atomic<long double>::fetch_add[abi:ne180000](long double, std::__1::memory_order) (
    this=0x7fffc4b8c8b0, __op=1.19999999999999995559, __m=std::__1::memory_order::seq_cst)
    at /b/my/libcxx_build_msan/include/c++/v1/__atomic/atomic.h:243
#6  0x000055606253fe6a in std::__1::atomic<long double>::operator+=[abi:ne180000](long double) (this=0x7fffc4b8c8b0, 
    __op=1.19999999999999995559) at /b/my/libcxx_build_msan/include/c++/v1/__atomic/atomic.h:262
#7  0x000055606253f646 in test_impl<long double, std::__1::type_identity_t> ()
    at /b/my/llvm-project/libcxx/test/std/atomics/atomics.types.generic/atomics.types.float/operator.plus_equals.pass.cpp:41
#8  0x0000556062500d99 in test<long double> ()
    at /b/my/llvm-project/libcxx/test/std/atomics/atomics.types.generic/atomics.types.float/operator.plus_equals.pass.cpp:90
#9  0x0000556062500d59 in main ()
    at /b/my/llvm-project/libcxx/test/std/atomics/atomics.types.generic/atomics.types.float/operator.plus_equals.pass.cpp:99
(gdb) s

@ldionne
Copy link
Member

ldionne commented Nov 22, 2023

@huixie90 This smells awfully like the padding bytes issue for long double, no? What else would make the compare exchange weak fail all the time? If so, we could consider disabling these tests for long double entirely until the Clang support isn't completely broken. This is actually kinda dangerous right now, people might be writing code using atomic<long double> and expecting it to work, when in reality it doesn't work properly and hasn't for probably 10 years.

@vitalybuka
Copy link
Collaborator Author

Buildbot is:

libatomic1:amd64                  13.1.0-2ubuntu2~23.04             amd64        support library providing __atomic built-in functions

Workstation:

libatomic1:amd64                                     13.2.0-4                                                    amd64        support library providing __atomic built-in functions

I copied compiled test binary and libc++ so files from the bot to the workstation and it does not hang.

@vitalybuka
Copy link
Collaborator Author

Note that padding bytes could not explain aarch64, failures which I didn't check yet.

@vitalybuka
Copy link
Collaborator Author

Also I replaced 13.1.0-2ubuntu2~23.04 with 13.2.0-4 and now all tests pass on the bot.
Looking into gcc repo, seems there was no libatomic changes between 13.1 and 13.2, but so files they look very different.

@vitalybuka
Copy link
Collaborator Author

Ubuntu 23.10 with libatomic 13.2.0-4 also hangs.
13.2.0-4 which does not hangs is from Debian

@huixie90
Copy link
Contributor

@huixie90 This smells awfully like the padding bytes issue for long double, no? What else would make the compare exchange weak fail all the time? If so, we could consider disabling these tests for long double entirely until the Clang support isn't completely broken. This is actually kinda dangerous right now, people might be writing code using atomic<long double> and expecting it to work, when in reality it doesn't work properly and hasn't for probably 10 years.

Yea the stacktrace clearly shows that it is long double. Yes we did have a big report from 2020 as we seen in the code review. the things I don’t understand is
1: we have a workaround for fetch_add and it is working on our CI envs but not here?
2. aarch64 long double has no padding

could we clarify in this build:

  1. The compiler is clang and OS is Linux
  2. does the non-msan build have the same issue?

In your debugger, is that is_fp80_long_double return true?

@vitalybuka
Copy link
Collaborator Author

Yea the stacktrace clearly shows that it is long double. Yes we did have a big report from 2020 as we seen in the code review. the things I don’t understand is
1: we have a workaround for fetch_add and it is working on our CI envs but not here?

As I wrote, I can reproduce exactly the same build only on Ubuntu, but no Debian
Copying over libatomic affect the issue

  1. aarch64 long double has no padding

Lets ignore that one, as it fails on the different tests. Maybe a different issue.

could we clarify in this build:

  1. The compiler is clang and OS is Linux

yes, clang is build from the same revision as your patch
Linux, I tried Ubuntu 23.04 and 23.10 - both reproducible
Same on Debian is not reproducible

  1. does the non-msan build have the same issue?

No

In your debugger, is that is_fp80_long_double return true?

Yes

I suspect you can reproduce on ubuntu of the same version using Docker or similar.

@vitalybuka
Copy link
Collaborator Author

vitalybuka commented Nov 23, 2023

aarch64 is actually good

both test are unexpectedly passed because there is not fp80, but they where marked as XFAIL

Unexpectedly Passed Tests (2):
  llvm-libc++-shared.cfg.in :: std/atomics/atomics.types.generic/atomics.types.float/exchange.pass.cpp
  llvm-libc++-shared.cfg.in :: std/atomics/atomics.types.generic/atomics.types.float/wait.pass.cpp

@huixie90
Copy link
Contributor

So the problem is specifically to

Ubuntu with MSAN build on x86 with fp80 long double

The loop that it got stucks in is a CAS loop where there is an existing problem for fp80 where compare-exchange always return false because of the padding. Inside the loop I have a workaround to copy the memory contents to “old” on a failing compare-exchange (and there is another exiting bug here that it does not seem to update the old for fp80) . Note normal assignment would not work because it does not assign the padding bits. I suspect MSAN is intercepting my workaround here. And yes we should really properly fix the issue rather than applying more and more workaround but we need the compiler side to help us.

ldionne pushed a commit that referenced this pull request Nov 23, 2023
Undo a part of #73152.

These test do not hang, but unexpectedlty pass on aarch64
https://lab.llvm.org/buildbot/#/builders/74/builds/23708

Tests work on aarch64 and probably any platform without fp80.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants