Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hash64 generate different value as other language implements #51

Closed
d3m3vilurr opened this issue Aug 30, 2024 · 4 comments · Fixed by #52
Closed

hash64 generate different value as other language implements #51

d3m3vilurr opened this issue Aug 30, 2024 · 4 comments · Fixed by #52
Labels

Comments

@d3m3vilurr
Copy link
Contributor

similar #27
hash32/64 families generated wrong value (idk reason fingerprint family generated correct value)

for example
python

>>> farmhash.hash64('foo')
6150913649986995171
>>> farmhash.fingerprint64('foo')
6150913649986995171
>>>

farmhash-modern

> x.hash64('foo')
6150913649986995171n
> x.fingerprint64('foo')
6150913649986995171n

but farmhash

> fh.hash64('foo')
'444527491298465133'
> fh.fingerprint64('foo')
'6150913649986995171'

is it bug or do i miss something?

@lovell
Copy link
Owner

lovell commented Aug 30, 2024

Did you see https://github.com/lovell/farmhash?tab=readme-ov-file#hash ?

The hash methods are platform dependent. Different CPU architectures, for example 32-bit vs 64-bit, Intel vs ARM, SSE4.2 vs AVX might produce different results for a given input.

WebAssembly provides what is essentially a virtual CPU so the hash-based methods will produce a different result from an implementation that can take advantage of SIMD features of a physical CPU.

If you need consistency, please use the fingerprint-based methods.

The fingerprint methods are platform independent, producing the same results for a given input on any machine.

https://github.com/lovell/farmhash?tab=readme-ov-file#fingerprint

@d3m3vilurr
Copy link
Contributor Author

well...that might be...
I was ignored this possibility that because I tested with manual build python lib & this in same computer.

both projects use cc source code for lib implementation and imo, these cc files are almost same.
only node version made different result.
I'll recheck compile option and others.

@d3m3vilurr
Copy link
Contributor Author

OK. I got a problem.

first in my case, a input string lengthis 3 (foo).
normal hash function calls NAMESPACE_FOR_HASH_FUNCTIONS::Hash64 and Fingerprint64 uses farmhashna version.

main logic is really same when a length is too small.
(because all HashLen0to16 with <= 3 has same implementation withoutSIMD or CPU accelations)

but still node version call makes differents values.

a main problem is, last step.

#if !defined(FARMHASH_DEBUG) && (!defined(NDEBUG) || defined(_DEBUG))
#define FARMHASH_DEBUG 1
#endif

...

template <typename T> STATIC_INLINE T DebugTweak(T x) {
    if (debug_mode) {
      if (sizeof(x) == 4) {
        x = ~Bswap32(x * c1);
      } else {
        x = ~Bswap64(x * k1);
      }
  }
  return x;
}

...

uint64_t Hash64(const char* s, size_t len) {
  return DebugTweak(
      (can_use_sse42 & x86_64) ?
      farmhashte::Hash64(s, len) :
      farmhashxo::Hash64(s, len))
  );
}

and also (unlike python building), node-gyp never set NDEBUG

so, default suggestiong is correct. differenct cpu can makes different result.
but at least, project have to prevent debug swap :)

diff --git a/binding.gyp b/binding.gyp
index 511a782..f3d3483 100644
--- a/binding.gyp
+++ b/binding.gyp
@@ -25,6 +25,9 @@
         ]
       }]
     ],
+    'defines': [
+      'FARMHASH_DEBUG=0'
+    ],
     'xcode_settings': {
       'CLANG_CXX_LIBRARY': 'libc++',
       'GCC_ENABLE_CPP_EXCEPTIONS': 'YES',

after the patch, result makes same result :)

d3m3vilurr added a commit to d3m3vilurr/farmhash that referenced this issue Sep 2, 2024
a last step of the hash functions runs one more swap if activated debug
mode.

this debug flags  makes slow down the processing speed and it also always
makes different result as fingerprint.

close lovell#51
@lovell
Copy link
Owner

lovell commented Sep 2, 2024

Ah, great spot about the (lack of) FARMHASH_DEBUG define, that makes sense and I agree we should set this.

d3m3vilurr added a commit to d3m3vilurr/farmhash that referenced this issue Sep 4, 2024
a last step of the hash functions runs one more swap if activated debug
mode.

this debug flags  makes slow down the processing speed and it also always
makes different result as fingerprint.

close lovell#51
@lovell lovell closed this as completed in #52 Sep 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants