Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Don't struct-promote opaque vectors #21314

Merged
merged 1 commit into from
Dec 6, 2018

Conversation

CarolEidt
Copy link

The hardware vector types: Vector64<T>, Vector128<T> and Vector256<T> are declared
as having one or more fields of ulong. However, the JIT shouldn't be promoting these
fields to local variables. It is almost never the right type, and the intrinsics in any
case are not designed to cooperate with promoted fields (i.e. an index of a Vector<ulong>
won't map to the promoted lclVar).

Most importantly, it causes all copies of the vector to be done as 64-bit integer loads
and stores.

Finally, it will be important, as we support vector ABIs, to distinguish the handling of
the fixed-size vectors (Vector2, Vector3 and Vector4) which are considered to be
normal structs of N floats, from the opaque types which will be passed in vector registers.

@mikedn
Copy link

mikedn commented Nov 30, 2018

Wasn't lvUsedInSIMDIntrinsic supposed to prevent promotion already?

@CarolEidt
Copy link
Author

Wasn't lvUsedInSIMDIntrinsic supposed to prevent promotion already?

Yes, but it is not used for the HW intrinsic vectors, and even for the other SIMD types, it is not set if the vector is only copied or passed/returned.

@mikedn
Copy link

mikedn commented Nov 30, 2018

Yes, but it is not used for the HW intrinsic vectors

Hmm, are the SetOpLclRelatedToSIMDIntrinsic calls in gtNewSimdHWIntrinsicNode unnecessary then?

@CarolEidt
Copy link
Author

Hmm, are the SetOpLclRelatedToSIMDIntrinsic calls in gtNewSimdHWIntrinsicNode unnecessary then?

Ah, I missed that. Yes, I think they will now be unnecessary, though one could consider that an optimization of sorts, since it will bypass the second check in the struct promotion test.

@CarolEidt CarolEidt force-pushed the DontPromoteHwVector branch from 714c47e to 0197694 Compare December 1, 2018 00:06
@tannergooding
Copy link
Member

since it will bypass the second check in the struct promotion test

Might it be worthwhile to enforce we set the flag and assert(!isOpaqueSIMDLclVar(varDsc) on the other code path?

@CarolEidt
Copy link
Author

Might it be worthwhile to enforce we set the flag and assert(!isOpaqueSIMDLclVar(varDsc) on the other code path?

I'm not sure what flag or "other code path" you're referring to?

@tannergooding
Copy link
Member

I meant keeping if (varDsc->lvIsSIMDType() && varDsc->lvIsUsedInSIMDIntrinsic()) (which requires we set IsUsedInSIMDIntrinsic()) and assert isOpaqueSIMDLclVar(varDesc) for the else case.

But, that wouldn't work for the cases where someone uses Vector128<T> and isn't also using intrinsics. There are also a number of other blocks that would make adding the assert more troublesome; so you can just ignore me 😄

@AaronRobinsonMSFT
Copy link
Member

@dotnet-bot test Windows_NT x86 min_opt ryujit CoreCLR Perf Tests Correctness please
@dotnet-bot test Windows_NT x86 Release Innerloop Build and Test please

The hardware vector types: `Vector64<T>`, `Vector128<T>` and `Vector256<T>` are declared
as having one or more fields of `ulong`. However, the JIT shouldn't be promoting these
fields to local variables. It is almost never the right type, and the intrinsics in any
case are not designed to cooperate with promoted fields (i.e. an index of a `Vector<ulong>`
won't map to the promoted lclVar).

Most importantly, it causes all copies of the vector to be done as 64-bit integer loads
and stores.

Finally, it will be important, as we support vector ABIs, to distinguish the handling of
the fixed-size vectors (`Vector2`, `Vector3` and `Vector4`) which *are* considered to be
normal structs of N floats, from the opaque types which will be passed in vector registers.
@CarolEidt CarolEidt force-pushed the DontPromoteHwVector branch from 0197694 to f06134e Compare December 1, 2018 16:10
@CarolEidt
Copy link
Author

@dotnet/jit-contrib PTAL

Copy link
Member

@BruceForstall BruceForstall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this fix a GitHub issue?

@CarolEidt
Copy link
Author

Does this fix a GitHub issue?

No, I don't believe so. I discovered this when I was seeing unexpected (poor) codegen while working on #21304.

I should probably check some of the struct-related issues to see if any of them are impacted by this. I have had some system issues so haven't been able to run full diffs.

On linux, I get 113 bytes (-0.002%) reduction in SPC.dll. There are 3 methods (actually 3 instantiations of the same method) that regressed:

  • System.Runtime.Intrinsics.Vector128`1[Double][System.Double]:get_Zero() - also the Int32 and Int64 versions
    • This generates a xorps; vmovupd instead of xor;mov;mov since the fields aren't promoted. This is a wash size-wise but then we also get a vzeroupper in the prolog because of the 32-byte operation.

Clearly there's room for improved heuristics such that we could treat these "as if" they are promoted to individual fields of the base type if merited. But the current approach is quite pessimizing.

@CarolEidt
Copy link
Author

OK, I finally had a chance to run diffs on Windows (full frameworks and tests this time), and spot-check the results:

  • X64 windows: -19855 (-0.02% of base), 2157 total methods with size differences (2131 improved, 26 regressed), 463246 unchanged.
  • X86 windows: -20530 (-0.02% of base), 2226 total methods with size differences (2193 improved, 33 regressed), 461433 unchanged

Overall, the reductions are places where we eliminated struct copies of arguments, and replaced pairs of 16-byte moves from memory with 32-byte moves from memory or register.

The regressions are variations on:

  • Code size is a mostly wash, except for the addition of one or more vzerouppers.
  • CSE effects where we create additional CSEs (e.g. due to framesize being smaller without an extra SIMD lclVar for copying) causing additional register resolution or spill.

@CarolEidt CarolEidt merged commit 61da68e into dotnet:master Dec 6, 2018
@CarolEidt CarolEidt deleted the DontPromoteHwVector branch December 6, 2018 21:50
CarolEidt added a commit to CarolEidt/coreclr that referenced this pull request Jan 28, 2019
- Enable CSE of struct values when handle is available (and add code to get the handle of HW SIMD types)
- Don't require block nodes for SIMD assignments
- Don't set `GTF_GLOB_REF` on `GT_OBJ` if it is local
- Set `lvRegStruct` on promoted SIMD fields
- Add tests for #19910 (fixed with this PR) and #3539 & #19910 (fixed with dotnet#21314)
- Additional cleanup
CarolEidt added a commit to CarolEidt/coreclr that referenced this pull request Mar 19, 2019
- Enable CSE of struct values when handle is available (and add code to get the handle of HW SIMD types)
- Don't require block nodes for SIMD assignments
- Don't set `GTF_GLOB_REF` on `GT_OBJ` if it is local
- Set `lvRegStruct` on promoted SIMD fields
- Add tests for #19910 (fixed with this PR) and #3539 & #19438 (fixed with dotnet#21314)
- Additional cleanup

Fix #19910
CarolEidt added a commit to CarolEidt/coreclr that referenced this pull request Mar 21, 2019
- Enable CSE of struct values when handle is available (and add code to get the handle of HW SIMD types)
- Don't require block nodes for SIMD assignments
- Don't set `GTF_GLOB_REF` on `GT_OBJ` if it is local
- Set `lvRegStruct` on promoted SIMD fields
- Add tests for #19910 (fixed with this PR) and #3539 & #19438 (fixed with dotnet#21314)
- Additional cleanup

Fix #19910
mikedn pushed a commit to mikedn/coreclr that referenced this pull request Mar 23, 2019
- Enable CSE of struct values when handle is available (and add code to get the handle of HW SIMD types)
- Don't require block nodes for SIMD assignments
- Don't set `GTF_GLOB_REF` on `GT_OBJ` if it is local
- Set `lvRegStruct` on promoted SIMD fields
- Add tests for #19910 (fixed with this PR) and #3539 & #19438 (fixed with dotnet#21314)
- Additional cleanup

Fix #19910
CarolEidt added a commit that referenced this pull request Mar 28, 2019
* [WIP] Struct & SIMD improvements

- Enable CSE of struct values when handle is available (and add code to get the handle of HW SIMD types)
- Don't require block nodes for SIMD assignments
- Don't set `GTF_GLOB_REF` on `GT_OBJ` if it is local
- Set `lvRegStruct` on promoted SIMD fields
- Add tests for #19910 (fixed with this PR) and #3539 & #19438 (fixed with #21314)
- Additional cleanup

Fix #19910
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
…ector

Don't struct-promote opaque vectors

Commit migrated from dotnet/coreclr@61da68e
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
* [WIP] Struct & SIMD improvements

- Enable CSE of struct values when handle is available (and add code to get the handle of HW SIMD types)
- Don't require block nodes for SIMD assignments
- Don't set `GTF_GLOB_REF` on `GT_OBJ` if it is local
- Set `lvRegStruct` on promoted SIMD fields
- Add tests for dotnet/coreclr#19910 (fixed with this PR) and dotnet/coreclr#3539 & dotnet/coreclr#19438 (fixed with dotnet/coreclr#21314)
- Additional cleanup

Fix dotnet/coreclr#19910

Commit migrated from dotnet/coreclr@3d4a1d5
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants