Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AArch64] Overlapping registers allocated to input and lateout constraints #106380

Closed
nikic opened this issue Aug 28, 2024 · 10 comments · Fixed by #108951
Closed

[AArch64] Overlapping registers allocated to input and lateout constraints #106380

nikic opened this issue Aug 28, 2024 · 10 comments · Fixed by #108951
Assignees

Comments

@nikic
Copy link
Contributor

nikic commented Aug 28, 2024

https://llvm.godbolt.org/z/xz6PYTaW3

target triple = "aarch64-unknown-linux-gnu"

define i1 @test(ptr %ptr) "target-features"="+lse" {
  %2 = call { i64, i64 } asm sideeffect "casp x2, x3, x2, x3, [${2}]", "=&{x2},=&{x3},r,0,1,~{memory}"(ptr %ptr, i64 0, i64 0)
  %3 = call i32 asm sideeffect "2:\0Aldxp xzr, ${0}, [${1}]\0Astxp ${0:w}, ${2}, ${3}, [${1}]\0Acbnz ${0:w}, 2b\0A", "=&r,r,r,r,~{memory}"(ptr %ptr, i64 0, i64 0)
  %4 = call i32 asm sideeffect "2:\0Aldxp xzr, ${0}, [${1}]\0Astxp ${0:w}, ${2}, ${3}, [${1}]\0Acbnz ${0:w}, 2b\0A", "=&r,r,r,r,~{memory}"(ptr %ptr, i64 undef, i64 0)
  ret i1 false
}

Results in:

<inline asm>:3:6: error: unpredictable STXP instruction, status is also a source
stxp w8, x8, x10, [x0]

Note that ${0:w} and ${2} are allocated to overlapping registers, even though a lateout =& constraint is used.

@llvmbot
Copy link
Member

llvmbot commented Aug 28, 2024

@llvm/issue-subscribers-backend-aarch64

Author: Nikita Popov (nikic)

https://llvm.godbolt.org/z/xz6PYTaW3 ```llvm target triple = "aarch64-unknown-linux-gnu"

define i1 @test(ptr %ptr) "target-features"="+lse" {
%2 = call { i64, i64 } asm sideeffect "casp x2, x3, x2, x3, [${2}]", "=&{x2},=&{x3},r,0,1,{memory}"(ptr %ptr, i64 0, i64 0)
%3 = call i32 asm sideeffect "2:\0Aldxp xzr, ${0}, [${1}]\0Astxp ${0:w}, ${2}, ${3}, [${1}]\0Acbnz ${0:w}, 2b\0A", "=&r,r,r,r,
{memory}"(ptr %ptr, i64 0, i64 0)
%4 = call i32 asm sideeffect "2:\0Aldxp xzr, ${0}, [${1}]\0Astxp ${0:w}, ${2}, ${3}, [${1}]\0Acbnz ${0:w}, 2b\0A", "=&r,r,r,r,~{memory}"(ptr %ptr, i64 undef, i64 0)
ret i1 false
}

Results in:

<inline asm>:3:6: error: unpredictable STXP instruction, status is also a source
stxp w8, x8, x10, [x0]

Note that `${0:w}` and `${2}` are allocated to overlapping registers, even though a lateout `=&amp;` constraint is used.
</details>

@nikic
Copy link
Contributor Author

nikic commented Aug 28, 2024

The fact that one is a subreg doesn't seem to be relevant (same behavior if using i64 result and dropping the :ws).

The fact that the operand is undef does seem relevant, but I'm not sure if it just perturbs allocation.

Maybe there is an (incorrect) assumption that clobbering does not matter for undef operands?

@davemgreen
Copy link
Collaborator

This sounds like it might be similar to the issue #77770 was addressing, IIRC. I don't know much more about it than that though (like why the register allocator is doing it or why it isn't better to fix as part of the register allocator).

@nikic
Copy link
Contributor Author

nikic commented Aug 28, 2024

Maybe @qcolombet has some insight into the regalloc behavior here.

@qcolombet
Copy link
Collaborator

The fact that the operand is undef does seem relevant, but I'm not sure if it just perturbs allocation.

It perturbs the allocation because it says that we do not care about the content of the register. Hence, we end up with an empty live-range which doesn't interfere with anything else and we end up reusing something that is already used. This is normally fine, but breaks the verifier here.

Technically I would say you get what you asked for: undef stored value ==> undef behavior for STXP.
I'm playing devil's advocate here to be fair.

I don't see an easy fix in the allocator though.

@nikic
Copy link
Contributor Author

nikic commented Aug 28, 2024

The fact that the operand is undef does seem relevant, but I'm not sure if it just perturbs allocation.

It perturbs the allocation because it says that we do not care about the content of the register. Hence, we end up with an empty live-range which doesn't interfere with anything else and we end up reusing something that is already used. This is normally fine, but breaks the verifier here.

Technically I would say you get what you asked for: undef stored value ==> undef behavior for STXP. I'm playing devil's advocate here to be fair.

I don't think this interpretation is quite right in this case, because an undef operand here should behave as-if it were any arbitrary value, while STXP with two equal registers is "constrained unpredictable" in a way that extends beyond that. That is, it may exhibit behavior that would not be observable for any specific choice of the undef value.

Independently of the STXP case, my expectation would also be that an inline asm block can only observe a single value for an input register, even if it has an undef value. Though given IR undef semantics, I can see how a reasonable person might disagree with that expectation...

@efriedma-quic
Copy link
Collaborator

I agree the current behavior is wrong; ensuring no overlap should be more intuitive for people writing inline asm. (Granted, I doubt this has much practical effect either way... if you end up with an inline asm with an undef input, you're probably doing something wrong.)

This is essentially the same as #77770.

@nikic
Copy link
Contributor Author

nikic commented Aug 30, 2024

It seems like making InitUndef work on AArch64 is the right way forward here. I've put up #106744 to make it easier to extend the pass to additional targets.

nikic added a commit that referenced this issue Sep 12, 2024
This is a variant of #106380
without inline assembly.
@nikic
Copy link
Contributor Author

nikic commented Sep 12, 2024

#108353 enables the pass for all targets and this does fix the equivalent issue when using intrinsics. Just enabling the pass doesn't help the inline asm case though, so something more is missing...

@nikic
Copy link
Contributor Author

nikic commented Sep 12, 2024

The missing bit turned out to be that the pass iterates over defs() instead of all_defs(). I'll fix that after the main change lands.

@nikic nikic self-assigned this Sep 17, 2024
nikic added a commit to nikic/llvm-project that referenced this issue Sep 17, 2024
InitUndef should also handle early-clobber / undef conflicts in
inline asm operands. Do this by iterating over all_defs() instead
of defs().

The newly added test was generating an "unpredictable STXP instruction,
status is also a source" error prior to this change.

Fixes llvm#106380.
@nikic nikic closed this as completed in 7183771 Sep 19, 2024
tmsri pushed a commit to tmsri/llvm-project that referenced this issue Sep 19, 2024
InitUndef should also handle early-clobber / undef conflicts in inline
asm operands. Do this by iterating over all_defs() instead of defs().

The newly added ARM test was generating an "unpredictable STXP instruction,
status is also a source" error prior to this change.

Fixes llvm#106380.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants