[HVX] Simplify constant factor before distributing #7009

rootjalex · 2022-09-12T18:40:28Z

The handling of widening_shift_left in HexagonOptimize's DistributeShiftsAsMuls was not properly producing widening_muls. This PR fixes that. Here's a small illustrative example:

ImageParam in_i8(Int(8), 1);
Func kernel("kernel");
Var x("x");

kernel(x) = i16(in_i8(x - 1)) + 2 * i16(in_i8(x)) + i16(in_i8(x + 1));
kernel.vectorize(x, 128);

Compiled with target="hexagon-64-noos-no_bounds_query-no_asserts-hvx_128-hvx_v66".

Before DistributeShiftsAsMuls:

kernel[ramp(kernel.s0.x.x*128, 1, 128) aligned(128, 0)] = ((int16x128)widening_shift_left(p0[ramp(t6, 1, 128)], x128((uint8)1)) + int16x128(p0[ramp(t6 + -1, 1, 128)])) + int16x128(p0[ramp(t6 + 1, 1, 128)])

Previously, after DistributeShiftsAsMuls:

kernel[ramp(kernel.s0.x.x*128, 1, 128) aligned(128, 0)] = ((int16x128(p0[ramp(t6, 1, 128)])*x128((int16)2)) + int16x128(p0[ramp(t6 + -1, 1, 128)])) + int16x128(p0[ramp(t6 + 1, 1, 128)])

Previously, codegen: 3 vsxt and 2 vmpyihb.acc

Now, after DistributeShiftsAsMuls:

kernel[ramp(kernel.s0.x.x*128, 1, 128) aligned(128, 0)] = ((int16x128)widening_mul(p0[ramp(t6, 1, 128)], x128((int8)2)) + int16x128(p0[ramp(t6 + -1, 1, 128)])) + int16x128(p0[ramp(t6 + 1, 1, 128)])

Now, codegen: 2 vsxt and 1 vmpybv.acc

I also added a test to simd_op_check_hvx.

rootjalex · 2022-09-12T18:52:16Z

Thanks for the speedy review @steven-johnson :)

pranavb-ca

Thanks for the PR @rootjalex

steven-johnson · 2022-09-12T23:49:42Z

OK to land, remaining buildbot is irrelevant

vksnk · 2022-09-14T17:34:53Z

In one of the Google generators, we are getting the following error:

LLVM ERROR: Error while trying to spill V16 from class HvxVR: Cannot scavenge register without an emergency spill slot!
*** SIGABRT received by PID 9628 (TID 9628) on cpu 13 from PID 9628; stack trace: ***
PC: @ 0x7f133326e347 (unknown) gsignal
@ 0x563dc2bffd94 976 FailureSignalHandler()
@ 0x7f13333c91c0 (unknown) (unknown)
@ 0x7f133326e347 136 gsignal
@ 0x7f133326f797 304 abort
@ 0x563dc266d670 224 llvm::report_fatal_error()
@ 0x563dc120cabb 480 llvm::RegScavenger::spill()
@ 0x563dc120d824 272 llvm::RegScavenger::scavengeRegisterBackwards()
@ 0x563dc120dfcb 80 scavengeVReg()
@ 0x563dc120dc12 128 scavengeFrameVirtualRegsInBlock()
@ 0x563dc120d968 64 llvm::scavengeFrameVirtualRegs()
@ 0x563dc11610c5 1472 (anonymous namespace)::PEI::runOnMachineFunction()
@ 0x563dc1082903 960 llvm::MachineFunctionPass::runOnFunction()
@ 0x563dc246dbec 192 llvm::FPPassManager::runOnFunction()
@ 0x563dc2474c03 48 llvm::FPPassManager::runOnModule()
@ 0x563dc246e2b2 352 llvm::legacy::PassManagerImpl::run()
@ 0x563dbf879d8e 912 Halide::emit_file()
@ 0x563dbf8b0997 8800 Halide::Internal::compile_module_to_hexagon_shared_object()
@ 0x563dbf89ceae 8480 Halide::Module::compile_to_buffer()
@ 0x563dbf89d621 112 Halide::Module::resolve_submodules()
@ 0x563dbf89ea88 1008 Halide::Module::compile()
@ 0x563dbf8a2f8e 1424 Halide::compile_multitarget()
@ 0x563dbf66aaa8 752 Halide::Internal::execute_generator()

Reverting this PR fixes the error. Any suggestions of what might be going on? @rootjalex @pranavb-ca

steven-johnson · 2022-09-14T17:41:37Z

Yeah, we're going to have to revert this change, at least temporarily

This reverts commit 69b50af.

rootjalex · 2022-09-14T17:48:40Z

(this is speculation, I don't know the HVX backend very well other than just instruction selection)

I suspect the issue can be highlighted in the test I added i16_1 + 2 * i16(i8_1) - before this PR, this would have been two vmpyi with one vector register and a regular register for the constant, but now it's one vmpy with two vector registers, one of which stores the broadcasted constant. I see now reason that the codegen should use the broadcasted register instead of the vector-scalar version.

I can't see any other reason that this PR would cause that error, but @pranavb-ca would certainly know better than I.

rootjalex · 2022-09-14T17:56:21Z

Ah, I think there is only an int8 vector-vector vmpy, there is no int8 vector-scalar vmpy.

rootjalex · 2022-09-14T18:11:08Z

Actually, that can't be the issue, there is no vector-scalar vmpyi instruction either. So I was incorrect, the original compilation would have been two vector-vector vmpyi instructions with a broadcasted constant versus one vector-vector vmpy instruction with a broadcasted constant.

Unfortunately, that means I have no idea how this was caused.

steven-johnson · 2022-09-14T18:59:52Z

Unfortunately, this failure is happening inside a heinously complicated chunk of Halide code which I can't share as-is. Getting a small repro case is gonna be challenging.

Revert "[HVX] Simplify constant factor before distributing (#7009)" This reverts commit 69b50af.

* simplify constant factor before distributing * add simd_op_check test

) Revert "[HVX] Simplify constant factor before distributing (halide#7009)" This reverts commit 69b50af.

rootjalex added 2 commits September 12, 2022 10:59

simplify constant factor before distributing

a5b94a1

add simd_op_check test

ebd75a3

rootjalex requested review from abadams and pranavb-ca September 12, 2022 18:40

steven-johnson approved these changes Sep 12, 2022

View reviewed changes

pranavb-ca approved these changes Sep 12, 2022

View reviewed changes

rootjalex merged commit 69b50af into main Sep 12, 2022

rootjalex deleted the rootjalex/distribute-w_shl branch September 12, 2022 23:51

steven-johnson added a commit that referenced this pull request Sep 14, 2022

Revert "[HVX] Simplify constant factor before distributing (#7009)"

7d7dca7

This reverts commit 69b50af.

steven-johnson mentioned this pull request Sep 14, 2022

Revert "[HVX] Simplify constant factor before distributing" #7013

Merged

steven-johnson mentioned this pull request Sep 14, 2022

HVX codegen error: "Cannot scavenge register without an emergency spill slot" #7015

Closed

steven-johnson added a commit that referenced this pull request Sep 14, 2022

Revert "[HVX] Simplify constant factor before distributing" (#7013)

18b06f0

Revert "[HVX] Simplify constant factor before distributing (#7009)" This reverts commit 69b50af.

steven-johnson mentioned this pull request Sep 14, 2022

[x86] Separate vector instruction selection and CodeGen passes #6884

Open

3 tasks

This was referenced Oct 11, 2022

[HVX] DistributeShiftsAsMuls is broken #7079

Closed

[HVX] "Cannot scavenge register without an emergency spill slot!" #7081

Closed

ardier pushed a commit to ardier/Halide-mutation that referenced this pull request Mar 3, 2024

[HVX] Simplify constant factor before distributing (halide#7009)

780e361

* simplify constant factor before distributing * add simd_op_check test

ardier pushed a commit to ardier/Halide-mutation that referenced this pull request Mar 3, 2024

Revert "[HVX] Simplify constant factor before distributing" (halide#7013

e80921a

) Revert "[HVX] Simplify constant factor before distributing (halide#7009)" This reverts commit 69b50af.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[HVX] Simplify constant factor before distributing #7009

[HVX] Simplify constant factor before distributing #7009

rootjalex commented Sep 12, 2022

rootjalex commented Sep 12, 2022

pranavb-ca left a comment

steven-johnson commented Sep 12, 2022

vksnk commented Sep 14, 2022

steven-johnson commented Sep 14, 2022

rootjalex commented Sep 14, 2022

rootjalex commented Sep 14, 2022

rootjalex commented Sep 14, 2022 •

edited

Loading

steven-johnson commented Sep 14, 2022

[HVX] Simplify constant factor before distributing #7009

[HVX] Simplify constant factor before distributing #7009

Conversation

rootjalex commented Sep 12, 2022

rootjalex commented Sep 12, 2022

pranavb-ca left a comment

Choose a reason for hiding this comment

steven-johnson commented Sep 12, 2022

vksnk commented Sep 14, 2022

steven-johnson commented Sep 14, 2022

rootjalex commented Sep 14, 2022

rootjalex commented Sep 14, 2022

rootjalex commented Sep 14, 2022 • edited Loading

steven-johnson commented Sep 14, 2022

rootjalex commented Sep 14, 2022 •

edited

Loading