Skip to content

Commit

Permalink
Improve F14find* by 5%-10% on Aarch64 (#2378)
Browse files Browse the repository at this point in the history
Summary:
Pull Request resolved: #2378

The diff simplifies the loop within F14find* by moving a shift operation from the condition to initialization.

This removes the need to perform a shift on each iteration. It also reduces the number of values needed simultaneously, potentially improving CPU register usage. Additionally, on aarch64 allows the usage of instruction subs.

Following disasm shows all theoretical benefits being exercised:

before:

  2dcd54:	91000508 	add	x8, x8, #0x1
  2dcd58:	9ac9250f 	lsr	x15, x8, x9
  2dcd5c:	8b1001ce 	add	x14, x14, x16
  2dcd60:	b4fffccf 	cbz	x15, 2dccf8 <_ZN30F14Map_equalityRefinement_Test8TestBodyEv+0x2f4>

after:

  2dce14:	f100054a 	subs	x10, x10, #0x1
  2dce18:	8b0e01ad 	add	x13, x13, x14
  2dce1c:	54fffce1 	b.ne	2dcdb8 <_ZN30F14Map_equalityRefinement_Test8TestBodyEv+0x2f4>  // b.any

Reviewed By: Gownta, embg

Differential Revision: D69056923

fbshipit-source-id: 2e7216986a751aade943985f2b43ee4e7edda4fa
  • Loading branch information
Nicoshev authored and facebook-github-bot committed Feb 4, 2025
1 parent 78860f0 commit 9d0b066
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions folly/container/detail/F14Table.h
Original file line number Diff line number Diff line change
Expand Up @@ -1592,7 +1592,7 @@ class F14Table : public Policy {
std::size_t index = hp.first;
std::size_t step = probeDelta(hp);
auto needleV = loadNeedleV(hp.second);
for (std::size_t tries = 0; tries >> chunkShift() == 0; ++tries) {
for (std::size_t tries = chunkCount(); tries > 0; --tries) {
ChunkPtr chunk = chunks_ + moduloByChunkCount(index);
if (prefetch == Prefetch::ENABLED && sizeof(Chunk) > 64) {
prefetchAddr(chunk->itemAddr(8));
Expand Down Expand Up @@ -1669,7 +1669,7 @@ class F14Table : public Policy {
std::size_t index = hp.first;
auto needleV = loadNeedleV(hp.second);
std::size_t step = probeDelta(hp);
for (std::size_t tries = 0; tries >> chunkShift() == 0; ++tries) {
for (std::size_t tries = chunkCount(); tries > 0; --tries) {
ChunkPtr chunk = chunks_ + moduloByChunkCount(index);
if (sizeof(Chunk) > 64) {
prefetchAddr(chunk->itemAddr(8));
Expand Down

0 comments on commit 9d0b066

Please sign in to comment.