Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HVX] Simplify constant factor before distributing #7009

Merged
merged 2 commits into from
Sep 12, 2022

Conversation

rootjalex
Copy link
Member

The handling of widening_shift_left in HexagonOptimize's DistributeShiftsAsMuls was not properly producing widening_muls. This PR fixes that. Here's a small illustrative example:

ImageParam in_i8(Int(8), 1);
Func kernel("kernel");
Var x("x");

kernel(x) = i16(in_i8(x - 1)) + 2 * i16(in_i8(x)) + i16(in_i8(x + 1));
kernel.vectorize(x, 128);

Compiled with target="hexagon-64-noos-no_bounds_query-no_asserts-hvx_128-hvx_v66".

Before DistributeShiftsAsMuls:

kernel[ramp(kernel.s0.x.x*128, 1, 128) aligned(128, 0)] = ((int16x128)widening_shift_left(p0[ramp(t6, 1, 128)], x128((uint8)1)) + int16x128(p0[ramp(t6 + -1, 1, 128)])) + int16x128(p0[ramp(t6 + 1, 1, 128)])

Previously, after DistributeShiftsAsMuls:

kernel[ramp(kernel.s0.x.x*128, 1, 128) aligned(128, 0)] = ((int16x128(p0[ramp(t6, 1, 128)])*x128((int16)2)) + int16x128(p0[ramp(t6 + -1, 1, 128)])) + int16x128(p0[ramp(t6 + 1, 1, 128)])

Previously, codegen: 3 vsxt and 2 vmpyihb.acc

Now, after DistributeShiftsAsMuls:

kernel[ramp(kernel.s0.x.x*128, 1, 128) aligned(128, 0)] = ((int16x128)widening_mul(p0[ramp(t6, 1, 128)], x128((int8)2)) + int16x128(p0[ramp(t6 + -1, 1, 128)])) + int16x128(p0[ramp(t6 + 1, 1, 128)])

Now, codegen: 2 vsxt and 1 vmpybv.acc

I also added a test to simd_op_check_hvx.

@rootjalex
Copy link
Member Author

Thanks for the speedy review @steven-johnson :)

Copy link
Contributor

@pranavb-ca pranavb-ca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @rootjalex

@steven-johnson
Copy link
Contributor

OK to land, remaining buildbot is irrelevant

@rootjalex rootjalex merged commit 69b50af into main Sep 12, 2022
@rootjalex rootjalex deleted the rootjalex/distribute-w_shl branch September 12, 2022 23:51
@vksnk
Copy link
Member

vksnk commented Sep 14, 2022

In one of the Google generators, we are getting the following error:

LLVM ERROR: Error while trying to spill V16 from class HvxVR: Cannot scavenge register without an emergency spill slot!
*** SIGABRT received by PID 9628 (TID 9628) on cpu 13 from PID 9628; stack trace: ***
PC: @ 0x7f133326e347 (unknown) gsignal
@ 0x563dc2bffd94 976 FailureSignalHandler()
@ 0x7f13333c91c0 (unknown) (unknown)
@ 0x7f133326e347 136 gsignal
@ 0x7f133326f797 304 abort
@ 0x563dc266d670 224 llvm::report_fatal_error()
@ 0x563dc120cabb 480 llvm::RegScavenger::spill()
@ 0x563dc120d824 272 llvm::RegScavenger::scavengeRegisterBackwards()
@ 0x563dc120dfcb 80 scavengeVReg()
@ 0x563dc120dc12 128 scavengeFrameVirtualRegsInBlock()
@ 0x563dc120d968 64 llvm::scavengeFrameVirtualRegs()
@ 0x563dc11610c5 1472 (anonymous namespace)::PEI::runOnMachineFunction()
@ 0x563dc1082903 960 llvm::MachineFunctionPass::runOnFunction()
@ 0x563dc246dbec 192 llvm::FPPassManager::runOnFunction()
@ 0x563dc2474c03 48 llvm::FPPassManager::runOnModule()
@ 0x563dc246e2b2 352 llvm::legacy::PassManagerImpl::run()
@ 0x563dbf879d8e 912 Halide::emit_file()
@ 0x563dbf8b0997 8800 Halide::Internal::compile_module_to_hexagon_shared_object()
@ 0x563dbf89ceae 8480 Halide::Module::compile_to_buffer()
@ 0x563dbf89d621 112 Halide::Module::resolve_submodules()
@ 0x563dbf89ea88 1008 Halide::Module::compile()
@ 0x563dbf8a2f8e 1424 Halide::compile_multitarget()
@ 0x563dbf66aaa8 752 Halide::Internal::execute_generator()

Reverting this PR fixes the error. Any suggestions of what might be going on? @rootjalex @pranavb-ca

@steven-johnson
Copy link
Contributor

Yeah, we're going to have to revert this change, at least temporarily

@rootjalex
Copy link
Member Author

(this is speculation, I don't know the HVX backend very well other than just instruction selection)

I suspect the issue can be highlighted in the test I added i16_1 + 2 * i16(i8_1) - before this PR, this would have been two vmpyi with one vector register and a regular register for the constant, but now it's one vmpy with two vector registers, one of which stores the broadcasted constant. I see now reason that the codegen should use the broadcasted register instead of the vector-scalar version.

I can't see any other reason that this PR would cause that error, but @pranavb-ca would certainly know better than I.

@rootjalex
Copy link
Member Author

Ah, I think there is only an int8 vector-vector vmpy, there is no int8 vector-scalar vmpy.

@rootjalex
Copy link
Member Author

rootjalex commented Sep 14, 2022

Actually, that can't be the issue, there is no vector-scalar vmpyi instruction either. So I was incorrect, the original compilation would have been two vector-vector vmpyi instructions with a broadcasted constant versus one vector-vector vmpy instruction with a broadcasted constant.

Unfortunately, that means I have no idea how this was caused.

@steven-johnson
Copy link
Contributor

Unfortunately, this failure is happening inside a heinously complicated chunk of Halide code which I can't share as-is. Getting a small repro case is gonna be challenging.

steven-johnson added a commit that referenced this pull request Sep 14, 2022
Revert "[HVX] Simplify constant factor before distributing (#7009)"

This reverts commit 69b50af.
ardier pushed a commit to ardier/Halide-mutation that referenced this pull request Mar 3, 2024
* simplify constant factor before distributing

* add simd_op_check test
ardier pushed a commit to ardier/Halide-mutation that referenced this pull request Mar 3, 2024
)

Revert "[HVX] Simplify constant factor before distributing (halide#7009)"

This reverts commit 69b50af.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants