Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible Data Race detected by TSAN #1820

Closed
aoeui opened this issue Jan 7, 2025 · 19 comments
Closed

Possible Data Race detected by TSAN #1820

aoeui opened this issue Jan 7, 2025 · 19 comments

Comments

@aoeui
Copy link

aoeui commented Jan 7, 2025

TSAN reported the following data race.

WARNING: ThreadSanitizer: data race (pid=4522)
  Read of size 4 at 0x0000bedf0004 by thread T41:
    #0 com.github.benmanes.caffeine.cache.References$WeakValueReference.getKeyReference()Ljava/lang/Object; References.java:267 
    #1 com.github.benmanes.caffeine.cache.PW.getKeyReference()Ljava/lang/Object; PW.java:62 
    #2 com.github.benmanes.caffeine.cache.BoundedLocalCache.evictEntry(Lcom/github/benmanes/caffeine/cache/Node;Lcom/github/benmanes/caffeine/cache/RemovalCause;J)Z BoundedLocalCache.java:1032 
    #3 com.github.benmanes.caffeine.cache.BoundedLocalCache.drainValueReferences()V BoundedLocalCache.java:1763 
    #4 com.github.benmanes.caffeine.cache.BoundedLocalCache.maintenance(Ljava/lang/Runnable;)V BoundedLocalCache.java:1722 
    #5 com.github.benmanes.caffeine.cache.BoundedLocalCache.performCleanUp(Ljava/lang/Runnable;)V BoundedLocalCache.java:1662 
    #6 com.github.benmanes.caffeine.cache.BoundedLocalCache$PerformCleanupTask.run()V BoundedLocalCache.java:3893 
    #7 com.github.benmanes.caffeine.cache.BoundedLocalCache$PerformCleanupTask.exec()Z BoundedLocalCache.java:3880 
    #8 java.util.concurrent.ForkJoinTask.doExec()I ForkJoinTask.java:387 
    #9 java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(Ljava/util/concurrent/ForkJoinTask;Ljava/util/concurrent/ForkJoinPool$WorkQueue;)V ForkJoinPool.java:1312 
    #10 java.util.concurrent.ForkJoinPool.scan(Ljava/util/concurrent/ForkJoinPool$WorkQueue;II)I ForkJoinPool.java:1843 
    #11 java.util.concurrent.ForkJoinPool.runWorker(Ljava/util/concurrent/ForkJoinPool$WorkQueue;)V ForkJoinPool.java:1808 
    #12 java.util.concurrent.ForkJoinWorkerThread.run()V ForkJoinWorkerThread.java:188 
    #13 (Generated Stub) <null> 

Previous write of size 4 at 0x0000bedf0004 by thread T101 (mutexes: write M0):
    #0 com.github.benmanes.caffeine.cache.References$WeakValueReference.<init>(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/ref/ReferenceQueue;)V References.java:262 
    #1 com.github.benmanes.caffeine.cache.PW.setValue(Ljava/lang/Object;Ljava/lang/ref/ReferenceQueue;)V PW.java:86 
    #2 com.github.benmanes.caffeine.cache.BoundedLocalCache.put(Ljava/lang/Object;Ljava/lang/Object;Lcom/github/benmanes/caffeine/cache/Expiry;Z)Ljava/lang/Object; BoundedLocalCache.java:2399 
    #3 com.github.benmanes.caffeine.cache.BoundedLocalCache.putIfAbsent(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; BoundedLocalCache.java:2285 

Using getOpaque in com.github.benmanes.caffeine.cache.node.AddKey.addIfCollectedValue instead of plain get fixes our TSAN build.

Let me know what you think.

@ben-manes
Copy link
Owner

TLDR

This is a false positive and by design, but TSAN is unable to infer that intent since it is a complex interaction. The change to getOpaque doesn't resolve it from a correctness perspective, but does hint to TSAN that this weaker visibility is intentional. It seems to then to trust the author and treat it as a benign data race. While there is little performance impact by the change, it is a very implicit way to suppress a false positive that a developer might misinterpret was meant for correctness and be confused. I'd recommend an explicit suppression, which I think has to be in your configuration file, so there's nothing we should reasonably change here.

Background

PW is a wrapper of the Map's value to hold eviction metadata. This includes the key and doubly linked list pointers for LRU, which then allows for an eviction to perform the map lookup on removal in O(1) time. The keyReference is the direct field on the entry, either <K> or WeakReference<K> if the cache is set with strong or weak keys, allowing the weakly held key to be garbage collected if there are no strong references in the application. In a hash map the key should never be modified or that violates the contract, so it is assumed to be immutable while in the map.

Caffeine uses a per-entry state machine to resolve races caused by non-deterministic ordering. The entry is alive if in both the hash map and eviction policy, retired if removed from the hashmap but in the eviction policy, and dead if removed from both data structures. The entry's key field is used to encode this to minimize memory overhead and is never modified when in the alive state, which is represented as the bare key reference. When the entry is removed from the map, e.g. by Cache.invalidate(key), then the key is marked as retired using a sentinel object. Similarly a sentinel object is used for the dead state when fully discarded. Any other cache operation that raced to read the entry from the hashmap prior to removal can validate its operation by checking if the entry is alive, e.g. after obtaining the synchronized instance lock, and retry its map read to get a fresh or absent instance.

Explanation

The entry is only modified within its instance lock which ensures exclusive access and inserts memory barriers for data visibility on entry and exit. The put holds this lock when it sets the value held by PW, in this case creating a new WeakValueReference due to the strong keys / weak value cache configuration. The key is stored in the WeakValueReference so that when eviction polls the ReferenceQueue it can lookup the mapping, and since the key is not modifiable the PW doesn't need to hold it redundantly (e.g. for other cases like LRU) and retrieves it from the value if queried for.

The put operation reads the PW instance from the map but the weak value was garbage collected. Since it was not yet evicted and held under a lock, the mapping can then be reused and the entry's value is updated. This effectively resurrects the node since it is still in the alive state.

Concurrently, an eviction is notified of the garbage collection polling the ReferenceQueue and attempts to discard the entry. It reads the keyReference outside of any lock, which is the same reference whether the old or new WeakValueReference. This is then used in an exclusive computeIfAbsent to remove the entry with the RemovalCause.COLLECTED. This operation validates that it has the same PW instance and that the entry is still eligible for eviction, e.g. that the value is null. If not then the eviction is abandoned as the entry resurrected. If successful in removal then it sets the state to retired, which replaces the keyReference.

If the eviction raced with an explicit remove the computeIfAbsent might find no mapping or a different PW instance if replaced, in a strong key case, and then be abandoned when it failed the verification. The absent mapping might be because the keyReference stayed removed or if the eviction thread observed the the retired sentinel instance. Since the sentinels are never mapped in the cache, either way it would skip over the computeIfPresent and not mark the flag as resurrected. The eviction handling would then eagerly discard the entry from the policy's data structures and skip the removal notification as handled by the user's explicit removal. The user's operation would submit a RemovalTask which would normally unlink it from the policy data structures, but in this case simply no-op since its work was handled by eviction.

Summary

Do you have any questions or further asks? I believe we can close this but let me know if you disagree or have a recommendation.

@aoeui
Copy link
Author

aoeui commented Jan 8, 2025

Wow, thanks for responding here, and in such detail.

I think the TSAN stack traces are complaining about a write from the constructor of WeakValueReference racing against a read happening in eviction. This made no sense to me at all, but then I read https://docs.oracle.com/javase/specs/jls/se7/html/jls-17.html#jls-17.5-110.

The way I read that is, for a non-final field, it can be the case that the constructor returns first and the partially constructed value can be passed to other threads before the constructor finishes assigning its non-final fields.

Since this is happening in the constructor the locking and guarded mutations that can happen don't seem relevant. The constructor has the ability to mutate the value at some arbitrarily later moment after it's been consumed by other threads.

I didn't fully trace the code, but I don't see, and I don't think TSAN sees a barrier preventing that from happening.

@ben-manes
Copy link
Owner

ben-manes commented Jan 8, 2025

Oh, that's a really good argument. However wouldn't that fix be to use a stronger setValue instead of the plain write? I don't think getOpaque would fix that.

Can you test using setRelease? I think that would be a reasonable fix.

@aoeui
Copy link
Author

aoeui commented Jan 8, 2025

setValue already uses setRelease

public final void setValue(V value, ReferenceQueue referenceQueue) {
Reference ref = (Reference) VALUE.get(this);
VALUE.setRelease(this, new WeakValueReference(getKeyReference(), value, referenceQueue));
ref.clear();
}

The plain .get here:

public final Object getKeyReference() {
WeakValueReference valueRef = (WeakValueReference) VALUE.get(this);
return valueRef.getKeyReference();
}

acts like a plain read and ignores the barrier set by the setRelease.

@ben-manes
Copy link
Owner

Oh right, then the only plain write is on the PW constructor. That's safe because it is not yet published to another thread and the memory barrier when the synchronized mutex is unlocked handles that.

Then shouldn't the plain read be fine since the release barrier forced the WeakReference's fields be set by the time it was visible to another thread?

@aoeui
Copy link
Author

aoeui commented Jan 8, 2025

Then shouldn't the plain read be fine since the release barrier forced the WeakReference's fields be set by the time it was visible to another thread?

I think the setRelease documentation is a bit misleading "Sets the value of a variable to the newValue, and ensures that prior loads and stores are not reordered after this access." That needs to be interpreted in the context of other operations that honor ordering constraints.

From the class documentation of VarHandle:

"Access modes control atomicity and consistency properties. Plain read (get) and write (set) accesses are guaranteed to be bitwise atomic only for references and for primitive values of at most 32 bits, and impose no observable ordering constraints with respect to threads other than the executing thread. Opaque operations are bitwise atomic and coherently ordered with respect to accesses to the same variable. In addition to obeying Opaque properties, Acquire mode reads and their subsequent accesses are ordered after matching Release mode writes and their previous accesses. In addition to obeying Acquire and Release properties, all Volatile operations are totally ordered with respect to each other."

The key bit in here is "Plain read (get) and write (set) accesses are guaranteed to be bitwise atomic only for references and for primitive values of at most 32 bits, and impose no observable ordering constraints with respect to threads other than the executing thread."

To get the happens-before relationship, it sounds to me like getOpaque is enough, and TSAN seems to think it's enough. (Though Gemini is telling me to go with getAcquire for clarity and certainty of portability.)

@ben-manes
Copy link
Owner

ben-manes commented Jan 8, 2025

I'll have to revisit this in the coming days, as I am a bit worn out today, because I still think that from a purely technical level the plain reads are fine. I'm also not against using an opaque read since the cost is not visible, but I feel like the reasons would be me saying "I'm not sure, hope this is right". I'd rather be able to be more confident while also being pragmatic to satisfy concerns. If we decide to use opaque for pragmatic reasons, we should feel okay commenting that so its understood why that mode was decided on (correctness, tooling, unsure but seems safer, etc).

My understanding is that a release store means that when the thread constructs the newValue, the compiler and memory barrier imposed forces all of the field assignments to have occurred before that reference is published. This means that for the reader to even observe the newValue pointer, it must be fully constructed and safe to use. Since final fields in a constructor are set by a release barrier, this makes sense to have other threads that read it not suffer from the partial construction problem. The final fields must have been set by the time the plain read occurs. Otherwise the semantics break down entirely and opaque would be needed by Java as its default accessor, which is not the case.

The 32-bit aspect is in reference to torn reads of long/double primitives and I recall that with the removal of the last 32-bit architecture from OpenJDK, they had planned on updating the spec to assume 64-bit minimum values (probably wrt Valhalla). The opaque mode strengthened to provide guarantees bitwise atomicity for these cases, but that is historical oddity that should not be relevant here. Then the meaningful remainder of opaque is program order within the single thread to enforce local restriction on reordering, so that certain reads/writes must be performed in the program's order and not by the compiler's optimized choice. This local ordering would have no bearing on the cross-thread synchronization, such as whether it could observe a partial object published by another thread.

These are the references that usually turn to and will try to review them. You might want to skim them as they are very insightful and this is a tricky subject.

@aoeui
Copy link
Author

aoeui commented Jan 8, 2025

My understanding is that a release store means that when the thread constructs the newValue, the compiler and memory barrier imposed forces all of the field assignments to have occurred before that reference is published. This means that for the reader to even observe the newValue pointer, it must be fully constructed and safe to use. Since final fields in a constructor are set by a release barrier, this makes sense to have other threads that read it not suffer from the partial construction problem. The final fields must have been set by the time the plain read occurs.

Unfortunately, none of the fields in com.github.benmanes.caffeine.cache.References.WeakValueReference are final.

Otherwise the semantics break down entirely and opaque would be needed by Java as its default accessor, which is not the case.

It's mostly the case across different threads that plain reads are the exception (though it is done deliberately in some places). You almost always need either a volatile read or one of the special UNSAFE / VarHandle methods that put in read barriers.

Thanks for taking the time to look up and point out your references. It's difficult to know what references are trustworthy. I see an exact match for our situation here.

https://shipilev.net/blog/2016/close-encounters-of-jmm-kind/#pitfall-semi-sync

@ben-manes
Copy link
Owner

I’ll review later but ChatGPT disagrees with you, see below. This project’s ci uses both x86 and arm64’s weaker memory model for validation.


Your scenario does not align with the pitfall described in the article because your design uses explicit release-acquire semantics (setRelease on the writer) to safely publish the WeakValueReference object, whereas the code in the article relies on insufficient synchronization (a synchronized writer but an unsynchronized reader).

Here’s a detailed comparison of your scenario and the pitfall:

Key Issue in the Article’s Example

In the article:
1. The writer (set(Box v)) is synchronized, meaning it creates a happens-before relationship for any thread reading the RacyBoxy reference if they use proper synchronization.
2. The reader (get()) is not synchronized, meaning it does not participate in the synchronization protocol.
• This breaks the happens-before relationship between threads.
• Another thread may observe the box reference (the Box object) being non-null while its fields (like x) are still in their default state due to reordering or lack of visibility guarantees.

In short, without proper synchronization on both sides, the Box fields may appear to be in an incomplete state, even though the Box reference itself is fully visible.

How Your Scenario Differs
1. Writer Uses setRelease
In your case, the setRelease ensures:
• All prior writes (e.g., initializing the fields of WeakValueReference) are visible before the reference to the WeakValueReference itself is published.
• The publication of the reference has a clear memory ordering: other threads will observe the fully constructed object once they see the reference.
2. Reader Behavior
In your scenario:
• The reader performs a plain read (or optionally an opaque or acquire read) to access the reference.
• While a plain read does not enforce memory ordering, it is sufficient here because:
• The release barrier (setRelease) on the writer side ensures that the reference is only made visible after the fields of WeakValueReference are fully initialized.
• The publishing guarantee ensures that no other thread can observe a partially constructed object.
By contrast, in the article’s example, the get() method does not provide any synchronization mechanism, so there is no guarantee that the reader sees a fully constructed object.
3. Memory Model Compliance
• Your use of setRelease establishes the necessary happens-before relationship for safe publication.
• A plain read is sufficient because it observes the already-published object. This is backed by the JMM’s guarantees about release-acquire synchronization.

Why the Article’s Pitfall Doesn’t Apply to Your Case

The problem in the article arises because:
1. The writer uses synchronization, but the reader doesn’t.
2. There’s no happens-before relationship between the writer and the reader, leading to potential visibility issues.

In contrast, your scenario avoids this pitfall because:
1. You explicitly use release-acquire semantics to ensure safe publication.
2. The reference is only made visible after all fields are initialized.
3. The use of setRelease on the writer side ensures that readers cannot observe a partially constructed object.

Takeaway

The pitfall in the article is a classic case of insufficient synchronization. Your scenario, however, is well-designed with explicit release semantics, ensuring proper object publication. Therefore, the concerns raised in the article about reordering and field visibility do not apply to your case.

@ben-manes
Copy link
Owner

ben-manes commented Jan 8, 2025

Unfortunately, none of the fields in com.github.benmanes.caffeine.cache.References.WeakValueReference are final.

I meant that final is translated into a hardware concept, a release barrier. We do this explicitly to get the same effect. Java’s WeakReference referent field is also not final, nor volatile, and is a plain read. That doesn’t mean it’s wrong, just that the language doesn’t explicitly model the hardware modes being relied on in subtle ways.

@aoeui
Copy link
Author

aoeui commented Jan 8, 2025

It would be cool if you could share the prompt you used. I think that what you said would work if the fields in WeakValueReference were final, but I don't think it works here. I'm very wary of ChatGPT as it tends to give you the result you want rather easily. I tried the following prompt in both ChatGPT and Gemini and it gave the same response.

Say I have something like:

class Box {
int x;

Box() {
x = 5;
}
}

...
HANDLE.setRelease(obj, new Box());

where HANDLE is a VarHandle to a Box field named box in obj. Can another thread, by doing a plain read of box.x observe a 0?

Both said yes, it is possible to see a 0 and they both give the same explanation.

"Crucially: The setRelease only guarantees that the write to the field in obj happens-before subsequent acquire-semantics reads (like getAcquire or getOpaque) of that same field in another thread.

It does not provide any guarantees about the visibility of writes that happened inside the Box constructor to other threads that do not use acquire semantics to read the field."

What is it that I'm not understanding?

@ben-manes
Copy link
Owner

ben-manes commented Jan 8, 2025

The reason why I thought the scenarios differ is because of where the StoreStore LoadStore barrier is applied and then what other data visibility relies on piggybacking on that fence. All operations prior to that store must be committed before it, which ensures that that a thread observing that store must then also observe all those prior stores. I think this was covered in On The Fence With Dependencies.

The synchronized example relies upon the unlock store to be visible. Entering it has acquire semantics and exiting has release semantics. Entering is typically implemented as a CAS to a lock variable which does not guarantee visibility of other writes to unrelated variables, whereas exiting is an unconditional volatile write. Therefore, the plain stores can be reordered so that the Box instance is written before its constructor finishes, and the cpu store buffer could flush that before the synchronized action is unlocked. When it is unlocked then the release store occurs, but that is too late as another thread could read the partial Box instance.

That example differs from Caffeine's where we use the release store on the variable of interest and are not performing plain stores to shared visible memory prior to that. The newly created WeakValueReference is only visible by the setValue thread until the release store, which forces all prior stores to also be observable.

Here's the chatgpt session from yesterday.

@aoeui
Copy link
Author

aoeui commented Jan 8, 2025

Thank you for sharing the prompt. I think there are two places where ChatGPT gets confused.

  1. "The entry is only modified under a lock, synchronized (PW), where PW is the instance. This is typically to set the value. The key is only modified after it was removed from the map either by an explicit removal or eviction, and the entry's lifecycle is encoded as a sentinel instance. "

I agree that the Node is modified under lock, but it is not locked in evictEntry, where the data race occurs. The only lock held at that point is evictionLock. However, evictionLock is not held by the thread performing put and doing the node creation. If you lock it when writing, but read it without locking it still causes a data race.

  1. "what memory mode does a final field represent? Is it an opaque write? release? How would I simulate that constructor barrier?"

I think ChatGPT starts to think that the fields are final here. It's true that setRelease would be enough if the fields were final, but that is not the case here.

@ben-manes
Copy link
Owner

A final field has release store semantics whereas a non-final would default to plain. The VALUE.setRelease enforces that to see the new instance, all of the preceding stores must also be observable by the other threads. ChatGPT is only a sanity check to poke holes, not to be relied upon. The guarantee is from the what the memory model promised.

If I get some time, I'll try to reproduce the Box sample and see if it will fail on my macbook M3 to see it it fails due to arm64's weaker memory model (though Apple offers the stronger x86 one, so unsure). If not, I'll try to use the CircleCI arm64 test runner we use for this project. If it can fail, then we can change it to the setRelease and show what happens on real hardware. Right now its a debate of whether TSAN is correct, but all other TSAN reports have been invalid so I don't discount it but also don't trust it more than the actual memory model.

@aoeui
Copy link
Author

aoeui commented Jan 8, 2025

The VALUE.setRelease enforces that to see the new instance, all of the preceding stores must also be observable by the other threads.

In my understanding of the memory model (not claiming to be an expert here), is that while that's true, you still need to use a read with some kind of acquire semantics to actually see the preceding stores from a different thread.

If it can fail, then we can change it to the setRelease and show what happens on real hardware.

Thanks, that experiment sounds like it'd be really helpful. It's so weird when you actually observe memory visibility issues in the real world.

@ben-manes
Copy link
Owner

My understanding is that the memory model of release has a guaranteed happens-before relationship that all prior writes are made visible to other threads that subsequently read the variable at any strength. The read would require an acquire to guarantee that it saw the new instance. That doesn't mean a plain read would observe a partial out-of-order write for the new instance, only that is can either observe the old or new instance.

If there was no cache coherence protocol and it was just raw memory and cache access of raw bits then I agree partial object could be observed. Thankfully the hardware still has the plain read participate to observe either the most recent version of the value in the current thread's cache (registers) or the most recent value written by another thread due to the changes being propagated. It will always have a coherent and valid state of memory, but without synchronization there won't be a timing and staleness guarantee.

If the acquire semantics was needed then opaque would not fix it because it lacks that effect. That's what makes me hesitant about TSAN having a true positive. On modern hardware its not expensive to acquire, so I'm not against that if we decide to, and pragmatically any gain by tuning memory barriers is noise. I just don't want to make to make the code more confusing and ensure we confidently know what we're doing and why.

@ben-manes
Copy link
Owner

The test passed.

The article's sample is more fully flushed out as AdvancedJMM_06_SemiSynchronized in the jcstress repository. Prior to noticing that, I set it up in this repository and switched to that code since it the api changed a bit. Unfortunately the gradle plugin is unsupported with deprecations so I will not merge this in. I added the following scenario.

@JCStressTest
@State
@Outcome(id = {"-1", "4"}, expect = ACCEPTABLE, desc = "Boring")
@Outcome(                  expect = FORBIDDEN,  desc = "Now forbidden")
public static class VarHandleRacy {
    public static final VarHandle HANDLE;
    static {
      try {
        HANDLE = MethodHandles.lookup()
            .findVarHandle(VarHandleRacy.class, "c", Composite.class);
      } catch (ReflectiveOperationException e) {
        throw new ExceptionInInitializerError(e);
      }
    }

    Composite c;

    @Actor
    void actor() {
        HANDLE.setRelease(this, new Composite(1));
    }

    @Actor
    void observer(I_Result r) {
        var lc = (Composite) HANDLE.get(this);
        r.r1 = (lc != null) ? lc.get() : -1;
    }
}

The console log shows only RacyRead and RacyWrite fail, while NonRacy and VarHandleRacy pass. You can download this html report for more details and apply this patch to run it.

@aoeui
Copy link
Author

aoeui commented Jan 9, 2025

Amazing. Thank you. I learned a lot going through this.

It might be nice if there were a comment in com.github.benmanes.caffeine.cache.node.AddKey explaining why the plain read is actually okay. It's pretty subtle.

@charlesmunger
Copy link

charlesmunger commented Feb 1, 2025

My understanding is that the memory model of release has a guaranteed happens-before relationship that all prior writes are made visible to other threads that subsequently read the variable at any strength.

This is a common misunderstanding. As a simple-ish example, consider the difference between final fields (which actually have that guarantee) and a varhandle release write. On ARM, using the Android RunTime (which I'm most familiar with and is easily accessible at godbolt.org).

Ordinary loads and stores are implemented with ldr and str instructions. load-acquire is implemented with ldar, store-release is implemented with stlr. However, final fields are implemented via a totally different mechanism - dmb ishst.

One way to model how this can go wrong in your head without reading ARM's 100mb PDF on their architecture is through a theoretical model of cache invalidation. Suppose you have two memory addresses, A and B (starting initialized to 0), and two processor cores with separate L1 and shared L2 cache.

str 1, [A]
stlr 1 [B]

On another core:

ldar w0 [B]
ldr w1 [A]

This next explanation is not how it actually works but is simplified to show how a processor using the arm instruction set could fail publication:

The values the reader can see for w0 and w1 are either 0,0, 0, 1, 1,1. This is because the stlr forces the write to A to go at least as far up in the cache hierarchy as the write to B, and the ldar forces the reader to invalidate any other dirty cache lines at the same level where it finds the value it's reading. So if A is in L1 cache and B is in L2 cache, that entry will be marked dirty by the act of reading B.

What if we don't load-acquire?

ldr w0 [B]
ldr w1 [A]

For one thing, out-of-order execution means that this could just load A before B. But imagine for the sake of argument that there is a data dependency between the load of A and B, such that you do have a guaranteed read order:

str 1, [A]
stlr A [B]

On another core:

ldr w0 [B]
ldr w1 [w0]

If the reading processor has address A in L1 cache and finds B in the shared L2 cache, it would be perfectly valid for it to give you 1, 0 - big disaster. But there is a way to make this valid, used to implement final field semantics, that is a superset of what is actually guaranteed by the java memory model:

str 1, [A]
dmb ishst
str A [B]

On another core:

ldr w0 [B]
ldr w1 [w0]

This gives the same guarantee as before, and is what people often expect to happen with release memory order. In our simplified cache model, the dmb ishst basically says "Flush my dirty cache lines up to the L2 cache, then invalidate the cache entry across every core's cache that matches one of my dirty cache lines". If A was in the reader core's L1 cache, it would be invalidated. So if the reader core sees w0 not null, it must go up to the L2 cache and produce w1 = 1. The java memory model guarantees this visibility for stuff that is reachable through the final field you set.

Compiler Explorer

In short, release memory order is only meaningful if you do an acquire when you load, and vice versa.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants