Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ht_dec.c: Improve MSVC arm64 popcount performance #1479

Merged
merged 3 commits into from
Dec 9, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions src/lib/openjp2/ht_dec.c
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,16 @@
#define OPJ_COMPILER_GNUC
#endif

#if defined(OPJ_COMPILER_MSVC) && defined(_M_ARM64) \
&& !defined(_M_ARM64EC) && !defined(_M_CEE_PURE) && !defined(__CUDACC__) \
&& !defined(__INTEL_COMPILER) && !defined(__clang__)
#define MSVC_NEON_INTRINSICS
#endif

#ifdef MSVC_NEON_INTRINSICS
#include <arm64_neon.h>
#endif

//************************************************************************/
/** @brief Displays the error message for disabling the decoding of SPP and
* MRP passes
Expand All @@ -71,6 +81,9 @@ OPJ_UINT32 population_count(OPJ_UINT32 val)
{
#if defined(OPJ_COMPILER_MSVC) && (defined(_M_IX86) || defined(_M_AMD64))
return (OPJ_UINT32)__popcnt(val);
#elif defined(OPJ_COMPILER_MSVC) && defined(MSVC_NEON_INTRINSICS)
const __n64 temp = neon_cnt(__uint64ToN64_v(val));
return neon_addv8(temp).n8_i8[0];
#elif (defined OPJ_COMPILER_GNUC)
return (OPJ_UINT32)__builtin_popcount(val);
#else
Expand Down
Loading