-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crc32 Partitioning Scheme Incompatible with librdkafka #2430
Comments
This comment was marked as outdated.
This comment was marked as outdated.
@csm8118 thanks for raising this issue. Is it something you’re still interested in contributing? Presumably we’d have retain the old (wrong) method with a config option to switch to the compatible version |
@dnwe I'd be happy to contribute this. My initial thinking was to introduce something like |
I quite like the names that librdkafka assigns to it’s available partitioners:
Especially interesting as I hadn’t necessarily realised before that we are FNV-1a by default, librdkafka and its brethren are all CRC32 by default and Java is murmur2 by default |
I created #2560 to implement this change. I tried to pick good names for things. Let me know what you think of the changes! |
Thanks @dnwe! Closing this issue out. |
Versions
Please specify real version numbers or git SHAs, not just "Latest" since that changes fairly regularly.
Problem Description
There is a difference between how sarama partitions when using the Crc32 hash and what is done in librdkafka (and ruby-kafka). It looks like this issue was reported some time ago but nothing came of it (#1213). The main issue is that librdkafka/ruby-kafka treat the hashed value from Crc32 as an unsigned value, whereas sarama casts it to a signed value. However, when using other hashing algorithms, the signed cast is necessary.
return rd_crc32(key, keylen) % partition_cnt;
@digest.hash(key) % partition_count
where@digest.hash
is defined asZlib.crc32(value)
partition = int32(p.hasher.Sum32()) % numPartitions
Here are some sample inputs that show the difference:
I created a custom partitioner (and tests) that replicates the ruby/librdkafka behavior. The relevant snippet of that code is below.
Sample testing output:
I would like to contribute this code to the sarama codebase, but I wanted to open this issue and have a discussion about it prior to creating a pull request.
The text was updated successfully, but these errors were encountered: