-
Notifications
You must be signed in to change notification settings - Fork 356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Weak memory emulation does not respect C++20 SCfix #2301
Comments
I believe that false positives can be produced even when fences are not used. In particular, consider this code: static X: AtomicBool = AtomicBool::new(false);
static Y: AtomicBool = AtomicBool::new(false);
let a = thread::spawn(|| X.store(true, Relaxed));
let b = thread::spawn(|| Y.store(true, Relaxed));
let c = thread::spawn(|| { while !X.load(SeqCst) {} Y.load(SeqCst) });
let d = thread::spawn(|| { while !Y.load(SeqCst) {} X.load(SeqCst) });
a.join().unwrap();
b.join().unwrap();
let c = c.join().unwrap();
let d = d.join().unwrap();
assert!(c || d); playground, I had to loop it to get Miri to fail Currently, Miri can panic on it, but by the C++20 rules I don’t think it is allowed to. The execution of the code can be represented with this diagram:
Using
|
Cc @cbeuw would be great if you could take a look at the above. :) |
Here do you mean "succeeding" as in "success" or as in "second"? |
As in “second”. |
Okay, makes sense. And thanks for the detailed description! I don't know the C++ memory model well enough to check if this is all according to spec, but the explanation made sense in itself. :) |
@SabrinaJewson I think the logic deduction is a bit off. You started with the assumption of B loading false, reached a contradiction, thus proving that B cannot load false. But surely you have proved that B cannot load false for all possible values of B, and not "all possible values of B cannot simultaneously load false"?. Suppose A is I actually do agree with your conclusion that at least one of them must be true though:
So we get cppmem says all 4 combinations of (
The crux of the issue here is that if you have two SC loads and a non-SC store, the first SC load doesn't see the store and the second one sees it, then under C++20 the first load must precede the second load in S, whereas previously this is not necessarily the case (unless covered by other coherence rules, such as if the two loads happen-before each other) |
This is not hard to fix I think. Basically, if there's an SC load that reads from any store, then another SC load later in S cannot read before that store. This is because read from gives you a cob, read before gives you a cob, cob + cob is a cob, and a cob between SCs gives you an extra S edge, so we can't create the read before edge on the second load, otherwise we create a backwards S edge. When going through the store buffer, we need to check if the current store element has ever been SC loaded. If so, then we need to stop the search. This is very similar to the basic Coherent-RR actually, which states that if two loads on the same location happen before each other, then the later load cannot read a value earlier in MO than what the first load saw. Except in C++20, it's both hb and S. The remaining question is whether this is consistent with the standard. I don't have the ability to prove anything like the author did, though the modification on the race detector side for the release sequence changes seem to be in the same boat. In any case, since I'm restricting the "outdatedness" of values a load can see, the new possible behaviours is a strict subset of the current implementation, so it can't be more unsound than now. |
Yeah, if this can definitely only rule out some behaviors Miri currently produces (and not add more behavior), and if it rules out some behavior that we agree is wrong, then we should do it even if it also rules out other behavior that would be allowed (we already have plenty of that). |
Strengthen C++20 SC accesses `@SabrinaJewson` noted in #2301 that Miri could produce behaviours forbidden under C++20 even without SC fences. Due to the added coherence-ordered before relationship which is created from read from and read before, plus the fact that coherence-ordered before between SC operations must be consistent with the Global Total Order S, in C++20 if there's an SC load that reads from any store, then a later SC load cannot read before that store. This PR adds this restriction
All right, that particular issue should be fixed by #2512. :) The wider problem of not respecting SC fences properly remains, though. This is somewhat surprising since, to my knowledge, SC fences are actually much more well-behaved in C++20 than SC accesses. For example, the operational Promising Semantics model supports SC fences but not SC accesses. I wonder if we can copy the SC fence approach from the promising semantics, or are there too many other differences that make this impractical? |
SC fences are better behaved. Adapting Promising Semantics' SC fences is possible and can actually simplify our code. The bit I'm not fully comfortable with is the interaction between SC fences and SC accesses, which are fairly intertwined. I'm having another look at the C++20 SC definitions. I think the new wording is relatively friendly to translate to operational models since it's worded in terms of "if this happened, then there's an S edge", so all we need to focus on is "don't create a backwards S edge by reading the wrong store", since we always have S naturally as the operational steps. But because coherence-ordered before is transitive, it might be tricky to cover all cases. |
Something we could do as mitigation for now is detect when the same location is used for both SC and non-SC accesses, and print a warning when that happens. It seems (based on what a colleague told me) that SCfix only affects the behavior of programs that mix SC and non-SC on the same location. |
C++20 SCFix comes in three parts:
So I believe that warning would only affect part (1). It still may be helpful to have, since users would be warning that Miri cannot test all branches of their code, but I don’t think it would help w.r.t. correctness. |
Your earlier example mixes SC and non-SC on the same location, doesn't it? I don't know this space enough to judge, I can just relay what other people tell me. 🤷 |
@SabrinaJewson do you have an example of a piece of code where Miri currently performs an execution that is incorrect under SCFix, and that does not mix SC and non-SC accesses on the same location? |
I am struggling to come up with such an example. I can easily make an example where I believe C++11 would allow the behaviour but it is forbidden by C++20, for example: static CARRIER: AtomicBool = AtomicBool::new(false);
static CHAINER: AtomicBool = AtomicBool::new(false);
static TARGET: AtomicBool = AtomicBool::new(false);
fn main() {
loop {
CARRIER.store(false, atomic::Ordering::Relaxed);
CHAINER.store(false, atomic::Ordering::Relaxed);
TARGET.store(false, atomic::Ordering::Relaxed);
let happens_before_thread = thread::spawn(|| {
let target = TARGET.load(atomic::Ordering::Relaxed);
CARRIER.store(true, atomic::Ordering::Release);
target
});
let first_fence_thread = thread::spawn(|| {
while !CARRIER.load(atomic::Ordering::Acquire) {}
atomic::fence(atomic::Ordering::SeqCst);
CHAINER.store(true, atomic::Ordering::Relaxed);
});
let second_fence_thread = thread::spawn(|| {
while !CHAINER.load(atomic::Ordering::Relaxed) {}
atomic::fence(atomic::Ordering::SeqCst);
TARGET.store(true, atomic::Ordering::Relaxed);
});
let target = happens_before_thread.join().unwrap();
first_fence_thread.join().unwrap();
second_fence_thread.join().unwrap();
assert!(!target);
}
}
use std::sync::atomic;
use std::sync::atomic::AtomicBool;
use std::thread; Here, C++11 would allow
In contrast C++20 disallows this execution because any operation that happens-before the first fence (i.e. However, this is not particularly helpful because even with the fences removed I could not get hardware, Miri, TSan or Loom to trigger on this code. It’s also true that depending on your interpretation of OOTA even in C++11 the execution without fences isn’t allowed in the first place. One can replace the communication used in Do you have an example of Miri producing an invalid execution that does mix |
Neither do I. 😂 It's basically an implementation of this paper, with slight adjustments described here and here.
I was under the impression that your example here qualifies? |
Ah, looks like #2512 fixed that particular example. Sadly I don't know enough about all this to come up with new examples. I just remember @cbeuw saying that this is not yet a proper fix. @cbeuw maybe you can help us with such an example? |
For the record, here is a program where the assertion would be allowed to fail Under C++11 but no more with SCfix. However it does not fail in Miri, so this does not help to determine whether there still is a bug to fix here. EDIT: The test had a bug oops. And the fixed test does fail in Miri! static X: AtomicBool = AtomicBool::new(false);
static Y: AtomicBool = AtomicBool::new(false);
fn main() {
for _ in 0..100 {
X.store(false, atomic::Ordering::Relaxed);
Y.store(false, atomic::Ordering::Relaxed);
let thread1 = thread::spawn(|| {
let a = X.load(atomic::Ordering::Relaxed);
atomic::fence(Ordering::SeqCst);
let b = Y.load(atomic::Ordering::Relaxed);
(a, b)
});
let thread2 = thread::spawn(|| {
X.store(true, Ordering::Relaxed);
});
let thread3 = thread::spawn(|| {
Y.store(true, Ordering::Relaxed);
});
let thread4 = thread::spawn(|| {
let c = Y.load(atomic::Ordering::Relaxed);
atomic::fence(Ordering::SeqCst);
let d = X.load(atomic::Ordering::Relaxed);
(c, d)
});
let (a, b) = thread1.join().unwrap();
thread2.join().unwrap();
thread3.join().unwrap();
let (c, d) = thread4.join().unwrap();
let bad = a == true && b == false && c == true && d == false;
assert!(!bad);
}
}
use std::sync::atomic::{self, Ordering};
use std::sync::atomic::AtomicBool;
use std::thread; |
Looks like #2301 (comment) does fail under Miri. However, I don't think the asserted execution is forbidden under SC Fix. An execution is only forbidden if there is a contradiction between the observed values and S. However, in the forbidden execution, none of the In other words, we need
Therefore S contains no Relaxed accesses and the execution is actually allowed under SC Fix. |
I'm out of my depth here.^^ I was just told that this is the canonical example for what SCfix is intended to rule out. I'll try to get an expert into this thread. :) |
The rf edges and rb edges in this drawing are all reversed. We do have "cob" from "Y.load false" to "Y.load true":
|
|
By the way, genmc agrees that the bad execution from #2301 (comment) is forbidden. equivalent c codeRun with
#include <stdlib.h>
#include <pthread.h>
#include <stdatomic.h>
#include <stdbool.h>
#include <assert.h>
atomic_bool X = false;
atomic_bool Y = false;
bool a = false;
bool b = false;
bool c = false;
bool d = false;
void *thread_1(void *unused)
{
a = atomic_load_explicit(&X, memory_order_relaxed);
atomic_thread_fence(memory_order_seq_cst);
b = atomic_load_explicit(&Y, memory_order_relaxed);
return NULL;
}
void *thread_2(void *unused)
{
atomic_store_explicit(&X, true, memory_order_relaxed);
return NULL;
}
void *thread_3(void *unused)
{
atomic_store_explicit(&Y, true, memory_order_relaxed);
return NULL;
}
void *thread_4(void *unused)
{
c = atomic_load_explicit(&Y, memory_order_relaxed);
atomic_thread_fence(memory_order_seq_cst);
d = atomic_load_explicit(&X, memory_order_relaxed);
return NULL;
}
int main()
{
pthread_t t1, t2, t3, t4;
if (pthread_create(&t1, NULL, thread_1, NULL)) abort();
if (pthread_create(&t2, NULL, thread_2, NULL)) abort();
if (pthread_create(&t3, NULL, thread_3, NULL)) abort();
if (pthread_create(&t4, NULL, thread_4, NULL)) abort();
if (pthread_join(t1, NULL)) abort();
if (pthread_join(t2, NULL)) abort();
if (pthread_join(t3, NULL)) abort();
if (pthread_join(t4, NULL)) abort();
bool bad = a == true && b == false && c == true && d == false;
assert(!bad);
return 0;
} |
Indeed I have drawn the rb edges the wrong way round 😅. The execution is forbidden and Miri is wrong here |
Okay, nice, we have an example. :) Now, the next questions are:
|
Our weak memory emulation does not respect the C++20 SCfix, so we could sometimes produce values that are not actually possible on the real machine. This can lead to false UB reports when SeqCst fences are used.
Unfortunately, no literature exists at the time of writing which proposes an implementable and C++20-compatible relaxed memory model that supports all atomic operation existing in Rust.
The text was updated successfully, but these errors were encountered: