Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements for SpanHelpers.IndexOf #64872

Merged
merged 3 commits into from
Feb 7, 2022
Merged

Conversation

EgorBo
Copy link
Member

@EgorBo EgorBo commented Feb 6, 2022

Attempt to fix regressions caused by #63285 for large input: #64381

  1. use nint instead of int where possible to avoid sign extensions
  2. Slightly move "candidate found" block and assume we mostly find nothing (in theory, PGO will do the same)
  3. Hoist invariant searchSpaceMinusValueTailLength - Vector128<ushort>.Count expression - JIT only does it for natural loops so I had to do it by hands.
  4. Re-order cmpCh1 and cmpCh2 - it slightly improves pipelining (currently we emit a GPR instruction between two independent SIMD compare instructions)

Benchmarks: (Regressed ones and the ones from #63285)

Method Toolchain Pattern Mean Error StdDev Ratio
Count \Core_Root_PR\corerun.exe Sherlock 44.90 us 0.267 us 0.237 us 1.00
Count \Core_Root_base\corerun.exe Sherlock 51.14 us 0.136 us 0.120 us 1.14
Count \Core_Root_PR\corerun.exe Sherlock Holmes 45.15 us 0.281 us 0.263 us 1.00
Count \Core_Root_base\corerun.exe Sherlock Holmes 50.79 us 0.089 us 0.083 us 1.12
Count \Core_Root_PR\corerun.exe Sherlock\s+Holmes 48.21 us 0.182 us 0.170 us 1.00
Count \Core_Root_base\corerun.exe Sherlock\s+Holmes 53.07 us 0.105 us 0.093 us 1.10
Count \Core_Root_PR\corerun.exe The 108.63 us 0.157 us 0.131 us 1.00
Count \Core_Root_base\corerun.exe The 115.75 us 0.115 us 0.096 us 1.07
Count \Core_Root_PR\corerun.exe zqj 34.84 us 0.209 us 0.195 us 1.00
Count \Core_Root_base\corerun.exe zqj 39.76 us 0.131 us 0.123 us 1.14
Method Toolchain Mean Error StdDev Ratio
IndexOf_Walking \Core_Root_PR\corerun.exe 9.277 ns 0.0282 ns 0.0264 ns 1.00
IndexOf_Walking \Core_Root_base\corerun.exe 9.761 ns 0.0105 ns 0.0093 ns 1.05
Contains_HtmlTag \Core_Root_PR\corerun.exe 8.836 ns 0.0257 ns 0.0228 ns 1.00
Contains_HtmlTag \Core_Root_base\corerun.exe 9.044 ns 0.0099 ns 0.0087 ns 1.02
IndexOf_Cmake \Core_Root_PR\corerun.exe 17.686 ns 0.0659 ns 0.0617 ns 1.00
IndexOf_Cmake \Core_Root_base\corerun.exe 20.795 ns 0.1066 ns 0.0998 ns 1.18

No changes for LastIndexOf for now (we also don't have reported regressions for it atm)

cc @stephentoub

@ghost
Copy link

ghost commented Feb 6, 2022

Tagging subscribers to this area: @dotnet/area-system-memory
See info in area-owners.md if you want to be subscribed.

Issue Details

Attempt to fix regressions caused by #63285 for large input: #64381

  1. use nint instead of int where possible to avoid sign extensions
  2. Slightly move "candidate found" block and assume we mostly find nothing
  3. Hoist invariant searchSpaceMinusValueTailLength - Vector128<ushort>.Count expression - JIT only does it for natural loops so I had to do it by hands.
  4. Re-order cmpCh1 and cmpCh2 - it slightly improves pipelining (currently we emit a GPR instruction between two SIMD compare)

Benchmarks: (Regressed ones and the ones from #63285)

Method Toolchain Pattern Mean Error StdDev Ratio
Count \Core_Root_PR\corerun.exe Sherlock 44.90 us 0.267 us 0.237 us 1.00
Count \Core_Root_base\corerun.exe Sherlock 51.14 us 0.136 us 0.120 us 1.14
Count \Core_Root_PR\corerun.exe Sherlock Holmes 45.15 us 0.281 us 0.263 us 1.00
Count \Core_Root_base\corerun.exe Sherlock Holmes 50.79 us 0.089 us 0.083 us 1.12
Count \Core_Root_PR\corerun.exe Sherlock\s+Holmes 48.21 us 0.182 us 0.170 us 1.00
Count \Core_Root_base\corerun.exe Sherlock\s+Holmes 53.07 us 0.105 us 0.093 us 1.10
Count \Core_Root_PR\corerun.exe The 108.63 us 0.157 us 0.131 us 1.00
Count \Core_Root_base\corerun.exe The 115.75 us 0.115 us 0.096 us 1.07
Count \Core_Root_PR\corerun.exe zqj 34.84 us 0.209 us 0.195 us 1.00
Count \Core_Root_base\corerun.exe zqj 39.76 us 0.131 us 0.123 us 1.14
Method Toolchain Mean Error StdDev Ratio
IndexOf_Walking \Core_Root_PR\corerun.exe 9.277 ns 0.0282 ns 0.0264 ns 1.00
IndexOf_Walking \Core_Root_base\corerun.exe 9.761 ns 0.0105 ns 0.0093 ns 1.05
Contains_HtmlTag \Core_Root_PR\corerun.exe 8.836 ns 0.0257 ns 0.0228 ns 1.00
Contains_HtmlTag \Core_Root_base\corerun.exe 9.044 ns 0.0099 ns 0.0087 ns 1.02
IndexOf_Cmake \Core_Root_PR\corerun.exe 17.686 ns 0.0659 ns 0.0617 ns 1.00
IndexOf_Cmake \Core_Root_base\corerun.exe 20.795 ns 0.1066 ns 0.0998 ns 1.18

No changes for LastIndexOf for now (we also don't have reported regressions for it atm)

cc @stephentoub

Author: EgorBo
Assignees: EgorBo
Labels:

area-System.Memory

Milestone: -

@EgorBo
Copy link
Member Author

EgorBo commented Feb 6, 2022

NOTE: there is still a room for improvement, e.g. use the trick used in IndexOfAny - do the last iteration separately and it will allow us to remove a condition from the main loop.

@AndyAyersMS
Copy link
Member

@EgorBo the pattern here of branching back to LOOP_FOOTER confuses the JIT's loop recognition. I wonder if we'd be better off in the long run just duplicating the footer code?

(context: #87194 (comment))

@EgorBo EgorBo deleted the revert-indexo branch July 25, 2023 00:45
@EgorBo
Copy link
Member Author

EgorBo commented Jul 25, 2023

@EgorBo the pattern here of branching back to LOOP_FOOTER confuses the JIT's loop recognition. I wonder if we'd be better off in the long run just duplicating the footer code?

(context: #87194 (comment))

That was sort of hand-written PGO and can be simplified now, although, NativeAOT might slightly regress

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants