Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Improve F14find* by 5%-10% on Aarch64 (#2378)
Summary: Pull Request resolved: #2378 The diff simplifies the loop within F14find* by moving a shift operation from the condition to initialization. This removes the need to perform a shift on each iteration. It also reduces the number of values needed simultaneously, potentially improving CPU register usage. Additionally, on aarch64 allows the usage of instruction subs. Following disasm shows all theoretical benefits being exercised: before: 2dcd54: 91000508 add x8, x8, #0x1 2dcd58: 9ac9250f lsr x15, x8, x9 2dcd5c: 8b1001ce add x14, x14, x16 2dcd60: b4fffccf cbz x15, 2dccf8 <_ZN30F14Map_equalityRefinement_Test8TestBodyEv+0x2f4> after: 2dce14: f100054a subs x10, x10, #0x1 2dce18: 8b0e01ad add x13, x13, x14 2dce1c: 54fffce1 b.ne 2dcdb8 <_ZN30F14Map_equalityRefinement_Test8TestBodyEv+0x2f4> // b.any Reviewed By: Gownta, embg Differential Revision: D69056923 fbshipit-source-id: 2e7216986a751aade943985f2b43ee4e7edda4fa
- Loading branch information