Improve F14find* by 5%-10% on Aarch64 (#2378)

Summary: Pull Request resolved: #2378 The diff simplifies the loop within F14find* by moving a shift operation from the condition to initialization. This removes the need to perform a shift on each iteration. It also reduces the number of values needed simultaneously, potentially improving CPU register usage. Additionally, on aarch64 allows the usage of instruction subs. Following disasm shows all theoretical benefits being exercised: before: 2dcd54: 91000508 add x8, x8, #0x1 2dcd58: 9ac9250f lsr x15, x8, x9 2dcd5c: 8b1001ce add x14, x14, x16 2dcd60: b4fffccf cbz x15, 2dccf8 <_ZN30F14Map_equalityRefinement_Test8TestBodyEv+0x2f4> after: 2dce14: f100054a subs x10, x10, #0x1 2dce18: 8b0e01ad add x13, x13, x14 2dce1c: 54fffce1 b.ne 2dcdb8 <_ZN30F14Map_equalityRefinement_Test8TestBodyEv+0x2f4> // b.any Reviewed By: Gownta, embg Differential Revision: D69056923 fbshipit-source-id: 2e7216986a751aade943985f2b43ee4e7edda4fa
facebook · Feb 4, 2025 · 9d0b066 · 9d0b066
1 parent 78860f0
commit 9d0b066
Showing 1 changed file with 2 additions and 2 deletions.
diff --git a/folly/container/detail/F14Table.h b/folly/container/detail/F14Table.h
@@ -1592,7 +1592,7 @@ class F14Table : public Policy {
     std::size_t index = hp.first;
     std::size_t step = probeDelta(hp);
     auto needleV = loadNeedleV(hp.second);
-    for (std::size_t tries = 0; tries >> chunkShift() == 0; ++tries) {
+    for (std::size_t tries = chunkCount(); tries > 0; --tries) {
       ChunkPtr chunk = chunks_ + moduloByChunkCount(index);
       if (prefetch == Prefetch::ENABLED && sizeof(Chunk) > 64) {
         prefetchAddr(chunk->itemAddr(8));
@@ -1669,7 +1669,7 @@ class F14Table : public Policy {
     std::size_t index = hp.first;
     auto needleV = loadNeedleV(hp.second);
     std::size_t step = probeDelta(hp);
-    for (std::size_t tries = 0; tries >> chunkShift() == 0; ++tries) {
+    for (std::size_t tries = chunkCount(); tries > 0; --tries) {
       ChunkPtr chunk = chunks_ + moduloByChunkCount(index);
       if (sizeof(Chunk) > 64) {
         prefetchAddr(chunk->itemAddr(8));