-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible Data Race detected by TSAN #1820
Comments
TLDRThis is a false positive and by design, but TSAN is unable to infer that intent since it is a complex interaction. The change to BackgroundPW is a wrapper of the Map's value to hold eviction metadata. This includes the key and doubly linked list pointers for LRU, which then allows for an eviction to perform the map lookup on removal in O(1) time. The Caffeine uses a per-entry state machine to resolve races caused by non-deterministic ordering. The entry is ExplanationThe entry is only modified within its instance lock which ensures exclusive access and inserts memory barriers for data visibility on entry and exit. The The Concurrently, an eviction is notified of the garbage collection polling the If the eviction raced with an explicit remove the SummaryDo you have any questions or further asks? I believe we can close this but let me know if you disagree or have a recommendation. |
Wow, thanks for responding here, and in such detail. I think the TSAN stack traces are complaining about a write from the constructor of WeakValueReference racing against a read happening in eviction. This made no sense to me at all, but then I read https://docs.oracle.com/javase/specs/jls/se7/html/jls-17.html#jls-17.5-110. The way I read that is, for a non-final field, it can be the case that the constructor returns first and the partially constructed value can be passed to other threads before the constructor finishes assigning its non-final fields. Since this is happening in the constructor the locking and guarded mutations that can happen don't seem relevant. The constructor has the ability to mutate the value at some arbitrarily later moment after it's been consumed by other threads. I didn't fully trace the code, but I don't see, and I don't think TSAN sees a barrier preventing that from happening. |
Oh, that's a really good argument. However wouldn't that fix be to use a stronger Can you test using |
setValue already uses setRelease public final void setValue(V value, ReferenceQueue referenceQueue) { The plain .get here: public final Object getKeyReference() { acts like a plain read and ignores the barrier set by the setRelease. |
Oh right, then the only plain write is on the PW constructor. That's safe because it is not yet published to another thread and the memory barrier when the synchronized mutex is unlocked handles that. Then shouldn't the plain read be fine since the release barrier forced the WeakReference's fields be set by the time it was visible to another thread? |
I think the setRelease documentation is a bit misleading "Sets the value of a variable to the newValue, and ensures that prior loads and stores are not reordered after this access." That needs to be interpreted in the context of other operations that honor ordering constraints. From the class documentation of VarHandle: "Access modes control atomicity and consistency properties. Plain read (get) and write (set) accesses are guaranteed to be bitwise atomic only for references and for primitive values of at most 32 bits, and impose no observable ordering constraints with respect to threads other than the executing thread. Opaque operations are bitwise atomic and coherently ordered with respect to accesses to the same variable. In addition to obeying Opaque properties, Acquire mode reads and their subsequent accesses are ordered after matching Release mode writes and their previous accesses. In addition to obeying Acquire and Release properties, all Volatile operations are totally ordered with respect to each other." The key bit in here is "Plain read (get) and write (set) accesses are guaranteed to be bitwise atomic only for references and for primitive values of at most 32 bits, and impose no observable ordering constraints with respect to threads other than the executing thread." To get the happens-before relationship, it sounds to me like getOpaque is enough, and TSAN seems to think it's enough. (Though Gemini is telling me to go with getAcquire for clarity and certainty of portability.) |
I'll have to revisit this in the coming days, as I am a bit worn out today, because I still think that from a purely technical level the plain reads are fine. I'm also not against using an opaque read since the cost is not visible, but I feel like the reasons would be me saying "I'm not sure, hope this is right". I'd rather be able to be more confident while also being pragmatic to satisfy concerns. If we decide to use opaque for pragmatic reasons, we should feel okay commenting that so its understood why that mode was decided on (correctness, tooling, unsure but seems safer, etc). My understanding is that a release store means that when the thread constructs the The 32-bit aspect is in reference to torn reads of long/double primitives and I recall that with the removal of the last 32-bit architecture from OpenJDK, they had planned on updating the spec to assume 64-bit minimum values (probably wrt Valhalla). The opaque mode strengthened to provide guarantees bitwise atomicity for these cases, but that is historical oddity that should not be relevant here. Then the meaningful remainder of opaque is program order within the single thread to enforce local restriction on reordering, so that certain reads/writes must be performed in the program's order and not by the compiler's optimized choice. This local ordering would have no bearing on the cross-thread synchronization, such as whether it could observe a partial object published by another thread. These are the references that usually turn to and will try to review them. You might want to skim them as they are very insightful and this is a tricky subject. |
Unfortunately, none of the fields in com.github.benmanes.caffeine.cache.References.WeakValueReference are final.
It's mostly the case across different threads that plain reads are the exception (though it is done deliberately in some places). You almost always need either a volatile read or one of the special UNSAFE / VarHandle methods that put in read barriers. Thanks for taking the time to look up and point out your references. It's difficult to know what references are trustworthy. I see an exact match for our situation here. https://shipilev.net/blog/2016/close-encounters-of-jmm-kind/#pitfall-semi-sync |
I’ll review later but ChatGPT disagrees with you, see below. This project’s ci uses both x86 and arm64’s weaker memory model for validation. Your scenario does not align with the pitfall described in the article because your design uses explicit release-acquire semantics (setRelease on the writer) to safely publish the WeakValueReference object, whereas the code in the article relies on insufficient synchronization (a synchronized writer but an unsynchronized reader). Here’s a detailed comparison of your scenario and the pitfall: Key Issue in the Article’s Example In the article: In short, without proper synchronization on both sides, the Box fields may appear to be in an incomplete state, even though the Box reference itself is fully visible. How Your Scenario Differs Why the Article’s Pitfall Doesn’t Apply to Your Case The problem in the article arises because: In contrast, your scenario avoids this pitfall because: Takeaway The pitfall in the article is a classic case of insufficient synchronization. Your scenario, however, is well-designed with explicit release semantics, ensuring proper object publication. Therefore, the concerns raised in the article about reordering and field visibility do not apply to your case. |
I meant that final is translated into a hardware concept, a release barrier. We do this explicitly to get the same effect. Java’s WeakReference referent field is also not final, nor volatile, and is a plain read. That doesn’t mean it’s wrong, just that the language doesn’t explicitly model the hardware modes being relied on in subtle ways. |
It would be cool if you could share the prompt you used. I think that what you said would work if the fields in WeakValueReference were final, but I don't think it works here. I'm very wary of ChatGPT as it tends to give you the result you want rather easily. I tried the following prompt in both ChatGPT and Gemini and it gave the same response. Say I have something like: class Box { Box() { ... where HANDLE is a VarHandle to a Box field named box in obj. Can another thread, by doing a plain read of box.x observe a 0? Both said yes, it is possible to see a 0 and they both give the same explanation. "Crucially: The setRelease only guarantees that the write to the field in obj happens-before subsequent acquire-semantics reads (like getAcquire or getOpaque) of that same field in another thread. It does not provide any guarantees about the visibility of writes that happened inside the Box constructor to other threads that do not use acquire semantics to read the field." What is it that I'm not understanding? |
The reason why I thought the scenarios differ is because of where the The synchronized example relies upon the unlock store to be visible. Entering it has That example differs from Caffeine's where we use the release store on the variable of interest and are not performing plain stores to shared visible memory prior to that. The newly created Here's the chatgpt session from yesterday. |
Thank you for sharing the prompt. I think there are two places where ChatGPT gets confused.
I agree that the Node is modified under lock, but it is not locked in evictEntry, where the data race occurs. The only lock held at that point is evictionLock. However, evictionLock is not held by the thread performing put and doing the node creation. If you lock it when writing, but read it without locking it still causes a data race.
I think ChatGPT starts to think that the fields are final here. It's true that setRelease would be enough if the fields were final, but that is not the case here. |
A final field has release store semantics whereas a non-final would default to plain. The If I get some time, I'll try to reproduce the Box sample and see if it will fail on my macbook M3 to see it it fails due to arm64's weaker memory model (though Apple offers the stronger x86 one, so unsure). If not, I'll try to use the CircleCI arm64 test runner we use for this project. If it can fail, then we can change it to the setRelease and show what happens on real hardware. Right now its a debate of whether TSAN is correct, but all other TSAN reports have been invalid so I don't discount it but also don't trust it more than the actual memory model. |
In my understanding of the memory model (not claiming to be an expert here), is that while that's true, you still need to use a read with some kind of acquire semantics to actually see the preceding stores from a different thread.
Thanks, that experiment sounds like it'd be really helpful. It's so weird when you actually observe memory visibility issues in the real world. |
My understanding is that the memory model of If there was no cache coherence protocol and it was just raw memory and cache access of raw bits then I agree partial object could be observed. Thankfully the hardware still has the plain read participate to observe either the most recent version of the value in the current thread's cache (registers) or the most recent value written by another thread due to the changes being propagated. It will always have a coherent and valid state of memory, but without synchronization there won't be a timing and staleness guarantee. If the acquire semantics was needed then |
The test passed. The article's sample is more fully flushed out as AdvancedJMM_06_SemiSynchronized in the jcstress repository. Prior to noticing that, I set it up in this repository and switched to that code since it the api changed a bit. Unfortunately the gradle plugin is unsupported with deprecations so I will not merge this in. I added the following scenario. @JCStressTest
@State
@Outcome(id = {"-1", "4"}, expect = ACCEPTABLE, desc = "Boring")
@Outcome( expect = FORBIDDEN, desc = "Now forbidden")
public static class VarHandleRacy {
public static final VarHandle HANDLE;
static {
try {
HANDLE = MethodHandles.lookup()
.findVarHandle(VarHandleRacy.class, "c", Composite.class);
} catch (ReflectiveOperationException e) {
throw new ExceptionInInitializerError(e);
}
}
Composite c;
@Actor
void actor() {
HANDLE.setRelease(this, new Composite(1));
}
@Actor
void observer(I_Result r) {
var lc = (Composite) HANDLE.get(this);
r.r1 = (lc != null) ? lc.get() : -1;
}
} The console log shows only RacyRead and RacyWrite fail, while NonRacy and VarHandleRacy pass. You can download this html report for more details and apply this patch to run it. |
Amazing. Thank you. I learned a lot going through this. It might be nice if there were a comment in com.github.benmanes.caffeine.cache.node.AddKey explaining why the plain read is actually okay. It's pretty subtle. |
This is a common misunderstanding. As a simple-ish example, consider the difference between final fields (which actually have that guarantee) and a varhandle release write. On ARM, using the Android RunTime (which I'm most familiar with and is easily accessible at godbolt.org). Ordinary loads and stores are implemented with One way to model how this can go wrong in your head without reading ARM's 100mb PDF on their architecture is through a theoretical model of cache invalidation. Suppose you have two memory addresses, A and B (starting initialized to 0), and two processor cores with separate L1 and shared L2 cache. str 1, [A]
stlr 1 [B] On another core: ldar w0 [B]
ldr w1 [A] This next explanation is not how it actually works but is simplified to show how a processor using the arm instruction set could fail publication: The values the reader can see for What if we don't load-acquire? ldr w0 [B]
ldr w1 [A] For one thing, out-of-order execution means that this could just load A before B. But imagine for the sake of argument that there is a data dependency between the load of A and B, such that you do have a guaranteed read order: str 1, [A]
stlr A [B] On another core: ldr w0 [B]
ldr w1 [w0] If the reading processor has address A in L1 cache and finds B in the shared L2 cache, it would be perfectly valid for it to give you str 1, [A]
dmb ishst
str A [B] On another core: ldr w0 [B]
ldr w1 [w0] This gives the same guarantee as before, and is what people often expect to happen with release memory order. In our simplified cache model, the In short, release memory order is only meaningful if you do an acquire when you load, and vice versa. |
TSAN reported the following data race.
Using
getOpaque
incom.github.benmanes.caffeine.cache.node.AddKey.addIfCollectedValue
instead of plainget
fixes our TSAN build.Let me know what you think.
The text was updated successfully, but these errors were encountered: