Skip to content

Commit

Permalink
Add a VPERMQ comment to flip_avx_acepck
Browse files Browse the repository at this point in the history
  • Loading branch information
okuhara committed Sep 16, 2024
1 parent 2199e33 commit 680d347
Showing 1 changed file with 4 additions and 0 deletions.
4 changes: 4 additions & 0 deletions src/flip_avx_acepck.c
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,10 @@ __m128i vectorcall mm_Flip(const __m128i OP, int pos)
__m256i PP, OO, flip, mask, rP, rS, rE, lO, lF;

PP = _mm256_broadcastq_epi64(OP);
// Nyanyan reported VPERMQ version is faster on Raptor Lake
// https://github.com/Nyanyan/Egaroucid/pull/320
// but VPERMQ is slow on AMD.
// OO = _mm256_permute4x64_epi64(_mm256_castsi128_si256(OP), 0x55);
OO = _mm256_broadcastq_epi64(_mm_unpackhi_epi64(OP, OP));

mask = lrmask[pos].v4[1];
Expand Down

0 comments on commit 680d347

Please sign in to comment.