-
Notifications
You must be signed in to change notification settings - Fork 12.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LoongArch] Improve codegen for atomic cmpxchg ops #69339
Conversation
PR llvm#67391 improved atomic codegen by handling memory ordering specified by the `cmpxchg` instruction. An acquire barrier needs to be generated when memory ordering includes an acquire operation. This PR improves the codegen further by only handling the failure ordering.
@llvm/pr-subscribers-backend-loongarch Author: Lu Weining (SixWeining) ChangesPR #67391 improved atomic codegen by handling memory ordering specified by the Full diff: https://github.com/llvm/llvm-project/pull/69339.diff 4 Files Affected:
diff --git a/llvm/lib/Target/LoongArch/LoongArchExpandAtomicPseudoInsts.cpp b/llvm/lib/Target/LoongArch/LoongArchExpandAtomicPseudoInsts.cpp
index b348cb56c136199..18a532b55ee5a92 100644
--- a/llvm/lib/Target/LoongArch/LoongArchExpandAtomicPseudoInsts.cpp
+++ b/llvm/lib/Target/LoongArch/LoongArchExpandAtomicPseudoInsts.cpp
@@ -571,11 +571,11 @@ bool LoongArchExpandAtomicPseudo::expandAtomicCmpXchg(
BuildMI(LoopTailMBB, DL, TII->get(LoongArch::B)).addMBB(DoneMBB);
}
- AtomicOrdering Ordering =
+ AtomicOrdering FailureOrdering =
static_cast<AtomicOrdering>(MI.getOperand(IsMasked ? 6 : 5).getImm());
int hint;
- switch (Ordering) {
+ switch (FailureOrdering) {
case AtomicOrdering::Acquire:
case AtomicOrdering::AcquireRelease:
case AtomicOrdering::SequentiallyConsistent:
diff --git a/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp b/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
index ecdbaf92e56c1a9..334daccab1e8ba0 100644
--- a/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
+++ b/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
@@ -4184,8 +4184,9 @@ LoongArchTargetLowering::shouldExpandAtomicCmpXchgInIR(
Value *LoongArchTargetLowering::emitMaskedAtomicCmpXchgIntrinsic(
IRBuilderBase &Builder, AtomicCmpXchgInst *CI, Value *AlignedAddr,
Value *CmpVal, Value *NewVal, Value *Mask, AtomicOrdering Ord) const {
- Value *Ordering =
- Builder.getIntN(Subtarget.getGRLen(), static_cast<uint64_t>(Ord));
+ AtomicOrdering FailOrd = CI->getFailureOrdering();
+ Value *FailureOrdering =
+ Builder.getIntN(Subtarget.getGRLen(), static_cast<uint64_t>(FailOrd));
// TODO: Support cmpxchg on LA32.
Intrinsic::ID CmpXchgIntrID = Intrinsic::loongarch_masked_cmpxchg_i64;
@@ -4196,7 +4197,7 @@ Value *LoongArchTargetLowering::emitMaskedAtomicCmpXchgIntrinsic(
Function *MaskedCmpXchg =
Intrinsic::getDeclaration(CI->getModule(), CmpXchgIntrID, Tys);
Value *Result = Builder.CreateCall(
- MaskedCmpXchg, {AlignedAddr, CmpVal, NewVal, Mask, Ordering});
+ MaskedCmpXchg, {AlignedAddr, CmpVal, NewVal, Mask, FailureOrdering});
Result = Builder.CreateTrunc(Result, Builder.getInt32Ty());
return Result;
}
diff --git a/llvm/lib/Target/LoongArch/LoongArchInstrInfo.td b/llvm/lib/Target/LoongArch/LoongArchInstrInfo.td
index a892ae65f610d03..3f67494bb284ae6 100644
--- a/llvm/lib/Target/LoongArch/LoongArchInstrInfo.td
+++ b/llvm/lib/Target/LoongArch/LoongArchInstrInfo.td
@@ -1814,7 +1814,7 @@ def PseudoMaskedAtomicLoadMin32 : PseudoMaskedAMMinMax;
class PseudoCmpXchg
: Pseudo<(outs GPR:$res, GPR:$scratch),
- (ins GPR:$addr, GPR:$cmpval, GPR:$newval, grlenimm:$ordering)> {
+ (ins GPR:$addr, GPR:$cmpval, GPR:$newval, grlenimm:$fail_order)> {
let Constraints = "@earlyclobber $res,@earlyclobber $scratch";
let mayLoad = 1;
let mayStore = 1;
@@ -1828,7 +1828,7 @@ def PseudoCmpXchg64 : PseudoCmpXchg;
def PseudoMaskedCmpXchg32
: Pseudo<(outs GPR:$res, GPR:$scratch),
(ins GPR:$addr, GPR:$cmpval, GPR:$newval, GPR:$mask,
- grlenimm:$ordering)> {
+ grlenimm:$fail_order)> {
let Constraints = "@earlyclobber $res,@earlyclobber $scratch";
let mayLoad = 1;
let mayStore = 1;
@@ -1846,6 +1846,43 @@ class AtomicPat<Intrinsic intrin, Pseudo AMInst>
: Pat<(intrin GPR:$addr, GPR:$incr, GPR:$mask, timm:$ordering),
(AMInst GPR:$addr, GPR:$incr, GPR:$mask, timm:$ordering)>;
+// These atomic cmpxchg PatFrags only care about the failure ordering.
+// The PatFrags defined by multiclass `ternary_atomic_op_ord` in
+// TargetSelectionDAG.td care about the merged memory ordering that is the
+// stronger one between success and failure. But for LoongArch LL-SC we only
+// need to care about the failure ordering as explained in PR #67391. So we
+// define these PatFrags that will be used to define cmpxchg pats below.
+multiclass ternary_atomic_op_failure_ord {
+ def NAME#_failure_monotonic : PatFrag<(ops node:$ptr, node:$cmp, node:$val),
+ (!cast<SDPatternOperator>(NAME) node:$ptr, node:$cmp, node:$val), [{
+ AtomicOrdering Ordering = cast<AtomicSDNode>(N)->getFailureOrdering();
+ return Ordering == AtomicOrdering::Monotonic;
+ }]>;
+ def NAME#_failure_acquire : PatFrag<(ops node:$ptr, node:$cmp, node:$val),
+ (!cast<SDPatternOperator>(NAME) node:$ptr, node:$cmp, node:$val), [{
+ AtomicOrdering Ordering = cast<AtomicSDNode>(N)->getFailureOrdering();
+ return Ordering == AtomicOrdering::Acquire;
+ }]>;
+ def NAME#_failure_release : PatFrag<(ops node:$ptr, node:$cmp, node:$val),
+ (!cast<SDPatternOperator>(NAME) node:$ptr, node:$cmp, node:$val), [{
+ AtomicOrdering Ordering = cast<AtomicSDNode>(N)->getFailureOrdering();
+ return Ordering == AtomicOrdering::Release;
+ }]>;
+ def NAME#_failure_acq_rel : PatFrag<(ops node:$ptr, node:$cmp, node:$val),
+ (!cast<SDPatternOperator>(NAME) node:$ptr, node:$cmp, node:$val), [{
+ AtomicOrdering Ordering = cast<AtomicSDNode>(N)->getFailureOrdering();
+ return Ordering == AtomicOrdering::AcquireRelease;
+ }]>;
+ def NAME#_failure_seq_cst : PatFrag<(ops node:$ptr, node:$cmp, node:$val),
+ (!cast<SDPatternOperator>(NAME) node:$ptr, node:$cmp, node:$val), [{
+ AtomicOrdering Ordering = cast<AtomicSDNode>(N)->getFailureOrdering();
+ return Ordering == AtomicOrdering::SequentiallyConsistent;
+ }]>;
+}
+
+defm atomic_cmp_swap_32 : ternary_atomic_op_failure_ord;
+defm atomic_cmp_swap_64 : ternary_atomic_op_failure_ord;
+
let Predicates = [IsLA64] in {
def : AtomicPat<int_loongarch_masked_atomicrmw_xchg_i64,
PseudoMaskedAtomicSwap32>;
@@ -1908,24 +1945,24 @@ def : AtomicPat<int_loongarch_masked_atomicrmw_umin_i64,
// AtomicOrdering.h.
multiclass PseudoCmpXchgPat<string Op, Pseudo CmpXchgInst,
ValueType vt = GRLenVT> {
- def : Pat<(vt (!cast<PatFrag>(Op#"_monotonic") GPR:$addr, GPR:$cmp, GPR:$new)),
+ def : Pat<(vt (!cast<PatFrag>(Op#"_failure_monotonic") GPR:$addr, GPR:$cmp, GPR:$new)),
(CmpXchgInst GPR:$addr, GPR:$cmp, GPR:$new, 2)>;
- def : Pat<(vt (!cast<PatFrag>(Op#"_acquire") GPR:$addr, GPR:$cmp, GPR:$new)),
+ def : Pat<(vt (!cast<PatFrag>(Op#"_failure_acquire") GPR:$addr, GPR:$cmp, GPR:$new)),
(CmpXchgInst GPR:$addr, GPR:$cmp, GPR:$new, 4)>;
- def : Pat<(vt (!cast<PatFrag>(Op#"_release") GPR:$addr, GPR:$cmp, GPR:$new)),
+ def : Pat<(vt (!cast<PatFrag>(Op#"_failure_release") GPR:$addr, GPR:$cmp, GPR:$new)),
(CmpXchgInst GPR:$addr, GPR:$cmp, GPR:$new, 5)>;
- def : Pat<(vt (!cast<PatFrag>(Op#"_acq_rel") GPR:$addr, GPR:$cmp, GPR:$new)),
+ def : Pat<(vt (!cast<PatFrag>(Op#"_failure_acq_rel") GPR:$addr, GPR:$cmp, GPR:$new)),
(CmpXchgInst GPR:$addr, GPR:$cmp, GPR:$new, 6)>;
- def : Pat<(vt (!cast<PatFrag>(Op#"_seq_cst") GPR:$addr, GPR:$cmp, GPR:$new)),
+ def : Pat<(vt (!cast<PatFrag>(Op#"_failure_seq_cst") GPR:$addr, GPR:$cmp, GPR:$new)),
(CmpXchgInst GPR:$addr, GPR:$cmp, GPR:$new, 7)>;
}
defm : PseudoCmpXchgPat<"atomic_cmp_swap_32", PseudoCmpXchg32>;
defm : PseudoCmpXchgPat<"atomic_cmp_swap_64", PseudoCmpXchg64, i64>;
def : Pat<(int_loongarch_masked_cmpxchg_i64
- GPR:$addr, GPR:$cmpval, GPR:$newval, GPR:$mask, timm:$ordering),
+ GPR:$addr, GPR:$cmpval, GPR:$newval, GPR:$mask, timm:$fail_order),
(PseudoMaskedCmpXchg32
- GPR:$addr, GPR:$cmpval, GPR:$newval, GPR:$mask, timm:$ordering)>;
+ GPR:$addr, GPR:$cmpval, GPR:$newval, GPR:$mask, timm:$fail_order)>;
def : PseudoMaskedAMMinMaxPat<int_loongarch_masked_atomicrmw_max_i64,
PseudoMaskedAtomicLoadMax32>;
diff --git a/llvm/test/CodeGen/LoongArch/ir-instruction/atomic-cmpxchg.ll b/llvm/test/CodeGen/LoongArch/ir-instruction/atomic-cmpxchg.ll
index 76f9ebed0d93ba4..417c865f6383ffb 100644
--- a/llvm/test/CodeGen/LoongArch/ir-instruction/atomic-cmpxchg.ll
+++ b/llvm/test/CodeGen/LoongArch/ir-instruction/atomic-cmpxchg.ll
@@ -129,7 +129,7 @@ define void @cmpxchg_i8_acquire_monotonic(ptr %ptr, i8 %cmp, i8 %val) nounwind {
; LA64-NEXT: beqz $a5, .LBB4_1
; LA64-NEXT: b .LBB4_4
; LA64-NEXT: .LBB4_3:
-; LA64-NEXT: dbar 20
+; LA64-NEXT: dbar 1792
; LA64-NEXT: .LBB4_4:
; LA64-NEXT: ret
%res = cmpxchg ptr %ptr, i8 %cmp, i8 %val acquire monotonic
@@ -162,7 +162,7 @@ define void @cmpxchg_i16_acquire_monotonic(ptr %ptr, i16 %cmp, i16 %val) nounwin
; LA64-NEXT: beqz $a5, .LBB5_1
; LA64-NEXT: b .LBB5_4
; LA64-NEXT: .LBB5_3:
-; LA64-NEXT: dbar 20
+; LA64-NEXT: dbar 1792
; LA64-NEXT: .LBB5_4:
; LA64-NEXT: ret
%res = cmpxchg ptr %ptr, i16 %cmp, i16 %val acquire monotonic
@@ -181,7 +181,7 @@ define void @cmpxchg_i32_acquire_monotonic(ptr %ptr, i32 %cmp, i32 %val) nounwin
; LA64-NEXT: beqz $a4, .LBB6_1
; LA64-NEXT: b .LBB6_4
; LA64-NEXT: .LBB6_3:
-; LA64-NEXT: dbar 20
+; LA64-NEXT: dbar 1792
; LA64-NEXT: .LBB6_4:
; LA64-NEXT: ret
%res = cmpxchg ptr %ptr, i32 %cmp, i32 %val acquire monotonic
@@ -200,7 +200,7 @@ define void @cmpxchg_i64_acquire_monotonic(ptr %ptr, i64 %cmp, i64 %val) nounwin
; LA64-NEXT: beqz $a4, .LBB7_1
; LA64-NEXT: b .LBB7_4
; LA64-NEXT: .LBB7_3:
-; LA64-NEXT: dbar 20
+; LA64-NEXT: dbar 1792
; LA64-NEXT: .LBB7_4:
; LA64-NEXT: ret
%res = cmpxchg ptr %ptr, i64 %cmp, i64 %val acquire monotonic
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you.
This is isomorphic to the LLVM changes [1-2]. On LoongArch, the LL and SC instructions has memory barrier semantics: - LL: <memory-barrier> + <load-exclusive> - SC: <store-conditional> + <memory-barrier> But the compare and swap operation is allowed to fail, and if it fails the SC instruction is not executed, thus the guarantee of acquiring semantics cannot be ensured. Therefore, an acquire barrier needs to be generated when failure_memorder includes an acquire operation. On CPUs implementing LoongArch v1.10 or later, "dbar 0b10100" is an acquire barrier; on CPUs implementing LoongArch v1.00, it is a full barrier. So it's always enough for acquire semantics. OTOH if an acquire semantic is not needed, we still needs the "dbar 0x700" as the load-load barrier like all LL-SC loops. [1]:llvm/llvm-project#67391 [2]:llvm/llvm-project#69339 gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_memmodel_needs_release_fence): Remove. (loongarch_cas_failure_memorder_needs_acquire): New static function. (loongarch_print_operand): Redefine 'G' for the barrier on CAS failure. * config/loongarch/sync.md (atomic_cas_value_strong<mode>): Remove the redundant barrier before the LL instruction, and emit an acquire barrier on failure if needed by failure_memorder. (atomic_cas_value_cmp_and_7_<mode>): Likewise. (atomic_cas_value_add_7_<mode>): Remove the unnecessary barrier before the LL instruction. (atomic_cas_value_sub_7_<mode>): Likewise. (atomic_cas_value_and_7_<mode>): Likewise. (atomic_cas_value_xor_7_<mode>): Likewise. (atomic_cas_value_or_7_<mode>): Likewise. (atomic_cas_value_nand_7_<mode>): Likewise. (atomic_cas_value_exchange_7_<mode>): Likewise. gcc/testsuite/ChangeLog: * gcc.target/loongarch/cas-acquire.c: New test.
This is isomorphic to the LLVM changes [1-2]. On LoongArch, the LL and SC instructions has memory barrier semantics: - LL: <memory-barrier> + <load-exclusive> - SC: <store-conditional> + <memory-barrier> But the compare and swap operation is allowed to fail, and if it fails the SC instruction is not executed, thus the guarantee of acquiring semantics cannot be ensured. Therefore, an acquire barrier needs to be generated when failure_memorder includes an acquire operation. On CPUs implementing LoongArch v1.10 or later, "dbar 0b10100" is an acquire barrier; on CPUs implementing LoongArch v1.00, it is a full barrier. So it's always enough for acquire semantics. OTOH if an acquire semantic is not needed, we still needs the "dbar 0x700" as the load-load barrier like all LL-SC loops. [1]:llvm/llvm-project#67391 [2]:llvm/llvm-project#69339 gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_memmodel_needs_release_fence): Remove. (loongarch_cas_failure_memorder_needs_acquire): New static function. (loongarch_print_operand): Redefine 'G' for the barrier on CAS failure. * config/loongarch/sync.md (atomic_cas_value_strong<mode>): Remove the redundant barrier before the LL instruction, and emit an acquire barrier on failure if needed by failure_memorder. (atomic_cas_value_cmp_and_7_<mode>): Likewise. (atomic_cas_value_add_7_<mode>): Remove the unnecessary barrier before the LL instruction. (atomic_cas_value_sub_7_<mode>): Likewise. (atomic_cas_value_and_7_<mode>): Likewise. (atomic_cas_value_xor_7_<mode>): Likewise. (atomic_cas_value_or_7_<mode>): Likewise. (atomic_cas_value_nand_7_<mode>): Likewise. (atomic_cas_value_exchange_7_<mode>): Likewise. gcc/testsuite/ChangeLog: * gcc.target/loongarch/cas-acquire.c: New test.
This is isomorphic to the LLVM changes [1-2]. On LoongArch, the LL and SC instructions has memory barrier semantics: - LL: <memory-barrier> + <load-exclusive> - SC: <store-conditional> + <memory-barrier> But the compare and swap operation is allowed to fail, and if it fails the SC instruction is not executed, thus the guarantee of acquiring semantics cannot be ensured. Therefore, an acquire barrier needs to be generated when failure_memorder includes an acquire operation. On CPUs implementing LoongArch v1.10 or later, "dbar 0b10100" is an acquire barrier; on CPUs implementing LoongArch v1.00, it is a full barrier. So it's always enough for acquire semantics. OTOH if an acquire semantic is not needed, we still needs the "dbar 0x700" as the load-load barrier like all LL-SC loops. [1]:llvm/llvm-project#67391 [2]:llvm/llvm-project#69339 Backported for fixing the acquire semantic issue which is known to cause troubles on LA664. gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_memmodel_needs_release_fence): Remove. (loongarch_cas_failure_memorder_needs_acquire): New static function. (loongarch_print_operand): Redefine 'G' for the barrier on CAS failure. * config/loongarch/sync.md (atomic_cas_value_strong<mode>): Remove the redundant barrier before the LL instruction, and emit an acquire barrier on failure if needed by failure_memorder. (atomic_cas_value_cmp_and_7_<mode>): Likewise. (atomic_cas_value_add_7_<mode>): Remove the unnecessary barrier before the LL instruction. (atomic_cas_value_sub_7_<mode>): Likewise. (atomic_cas_value_and_7_<mode>): Likewise. (atomic_cas_value_xor_7_<mode>): Likewise. (atomic_cas_value_or_7_<mode>): Likewise. (atomic_cas_value_nand_7_<mode>): Likewise. (atomic_cas_value_exchange_7_<mode>): Likewise. gcc/testsuite/ChangeLog: * gcc.target/loongarch/cas-acquire.c: New test. (cherry picked from commit 4d86dc5)
This is isomorphic to the LLVM changes [1-2]. On LoongArch, the LL and SC instructions has memory barrier semantics: - LL: <memory-barrier> + <load-exclusive> - SC: <store-conditional> + <memory-barrier> But the compare and swap operation is allowed to fail, and if it fails the SC instruction is not executed, thus the guarantee of acquiring semantics cannot be ensured. Therefore, an acquire barrier needs to be generated when failure_memorder includes an acquire operation. On CPUs implementing LoongArch v1.10 or later, "dbar 0b10100" is an acquire barrier; on CPUs implementing LoongArch v1.00, it is a full barrier. So it's always enough for acquire semantics. OTOH if an acquire semantic is not needed, we still needs the "dbar 0x700" as the load-load barrier like all LL-SC loops. [1]:llvm/llvm-project#67391 [2]:llvm/llvm-project#69339 gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_memmodel_needs_release_fence): Remove. (loongarch_cas_failure_memorder_needs_acquire): New static function. (loongarch_print_operand): Redefine 'G' for the barrier on CAS failure. * config/loongarch/sync.md (atomic_cas_value_strong<mode>): Remove the redundant barrier before the LL instruction, and emit an acquire barrier on failure if needed by failure_memorder. (atomic_cas_value_cmp_and_7_<mode>): Likewise. (atomic_cas_value_add_7_<mode>): Remove the unnecessary barrier before the LL instruction. (atomic_cas_value_sub_7_<mode>): Likewise. (atomic_cas_value_and_7_<mode>): Likewise. (atomic_cas_value_xor_7_<mode>): Likewise. (atomic_cas_value_or_7_<mode>): Likewise. (atomic_cas_value_nand_7_<mode>): Likewise. (atomic_cas_value_exchange_7_<mode>): Likewise. gcc/testsuite/ChangeLog: * gcc.target/loongarch/cas-acquire.c: New test. (cherry picked from commit 4d86dc5)
This is isomorphic to the LLVM changes [1-2]. On LoongArch, the LL and SC instructions has memory barrier semantics: - LL: <memory-barrier> + <load-exclusive> - SC: <store-conditional> + <memory-barrier> But the compare and swap operation is allowed to fail, and if it fails the SC instruction is not executed, thus the guarantee of acquiring semantics cannot be ensured. Therefore, an acquire barrier needs to be generated when failure_memorder includes an acquire operation. On CPUs implementing LoongArch v1.10 or later, "dbar 0b10100" is an acquire barrier; on CPUs implementing LoongArch v1.00, it is a full barrier. So it's always enough for acquire semantics. OTOH if an acquire semantic is not needed, we still needs the "dbar 0x700" as the load-load barrier like all LL-SC loops. [1]:llvm/llvm-project#67391 [2]:llvm/llvm-project#69339 Backported for fixing the acquire semantic issue which is known to cause troubles on LA664. gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_memmodel_needs_release_fence): Remove. (loongarch_cas_failure_memorder_needs_acquire): New static function. (loongarch_print_operand): Redefine 'G' for the barrier on CAS failure. * config/loongarch/sync.md (atomic_cas_value_strong<mode>): Remove the redundant barrier before the LL instruction, and emit an acquire barrier on failure if needed by failure_memorder. (atomic_cas_value_cmp_and_7_<mode>): Likewise. (atomic_cas_value_add_7_<mode>): Remove the unnecessary barrier before the LL instruction. (atomic_cas_value_sub_7_<mode>): Likewise. (atomic_cas_value_and_7_<mode>): Likewise. (atomic_cas_value_xor_7_<mode>): Likewise. (atomic_cas_value_or_7_<mode>): Likewise. (atomic_cas_value_nand_7_<mode>): Likewise. (atomic_cas_value_exchange_7_<mode>): Likewise. gcc/testsuite/ChangeLog: * gcc.target/loongarch/cas-acquire.c: New test. (cherry picked from commit 4d86dc5)
This is isomorphic to the LLVM changes [1-2]. On LoongArch, the LL and SC instructions has memory barrier semantics: - LL: <memory-barrier> + <load-exclusive> - SC: <store-conditional> + <memory-barrier> But the compare and swap operation is allowed to fail, and if it fails the SC instruction is not executed, thus the guarantee of acquiring semantics cannot be ensured. Therefore, an acquire barrier needs to be generated when failure_memorder includes an acquire operation. On CPUs implementing LoongArch v1.10 or later, "dbar 0b10100" is an acquire barrier; on CPUs implementing LoongArch v1.00, it is a full barrier. So it's always enough for acquire semantics. OTOH if an acquire semantic is not needed, we still needs the "dbar 0x700" as the load-load barrier like all LL-SC loops. [1]:llvm/llvm-project#67391 [2]:llvm/llvm-project#69339 gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_memmodel_needs_release_fence): Remove. (loongarch_cas_failure_memorder_needs_acquire): New static function. (loongarch_print_operand): Redefine 'G' for the barrier on CAS failure. * config/loongarch/sync.md (atomic_cas_value_strong<mode>): Remove the redundant barrier before the LL instruction, and emit an acquire barrier on failure if needed by failure_memorder. (atomic_cas_value_cmp_and_7_<mode>): Likewise. (atomic_cas_value_add_7_<mode>): Remove the unnecessary barrier before the LL instruction. (atomic_cas_value_sub_7_<mode>): Likewise. (atomic_cas_value_and_7_<mode>): Likewise. (atomic_cas_value_xor_7_<mode>): Likewise. (atomic_cas_value_or_7_<mode>): Likewise. (atomic_cas_value_nand_7_<mode>): Likewise. (atomic_cas_value_exchange_7_<mode>): Likewise. gcc/testsuite/ChangeLog: * gcc.target/loongarch/cas-acquire.c: New test.
PR #67391 improved atomic codegen by handling memory ordering specified by the
cmpxchg
instruction. An acquire barrier needs to be generated when memory ordering includes an acquire operation. This PR improves the codegen further by only handling the failure ordering.