feat: merge apache kafka trunk #1030

superhx · 2024-03-26T12:41:53Z

No description provided.

Performs additional unwrap during handshake after data from client is processed to support openssl, which needs the extra unwrap to complete handshake. Reviewers: Ismael Juma <[email protected]>, Rajini Sivaram <[email protected]>

Adding the following rebalance metrics to the consumer: rebalance-latency-avg rebalance-latency-max rebalance-latency-total rebalance-rate-per-hour rebalance-total failed-rebalance-rate-per-hour failed-rebalance-total Due to the difference in protocol, we need to redefine when rebalance starts and ends. Start of Rebalance: Current: Right before sending out JoinGroup ConsumerGroup: When the client receives assignments from the HB End of Rebalance - Successful Case: Current: Receiving SyncGroup request after transitioning to "COMPLETING_REBALANCE" ConsumerGroup: After completing reconciliation and right before sending out "Ack" heartbeat End of Rebalance - Failed Case: Current: Any failure in the JoinGroup/SyncGroup response ConsumerGroup: Failure in the heartbeat Note: Afterall, we try to be consistent with the current protocol. Rebalances start and end with sending and receiving network requests. Failures in network requests signify the user failures in rebalance. And it is entirely possible to have multiple failures before having a successful one. Reviewers: Lucas Brutschy <[email protected]>

`poll(long timeout, TimeUnit unit)` is either used with `Long.MAX_VALUE` or `0`. This patch replaces it with `poll` and `take`. It removes the `awaitNanos` usage. Reviewers: Jeff Kim <[email protected]>, Justine Olshan <[email protected]>

Remove the space between two words Reviewers: Luke Chen <[email protected]>

This is the first part of the implementation of KIP-1005 The purpose of this pull request is for the broker to start returning the correct offset when it receives a -5 as a timestamp in a ListOffsets API request Reviewers: Luke Chen <[email protected]>, Kamal Chandraprakash <[email protected]>, Satish Duggana <[email protected]>

… (#15150) In KIP-848, we introduce the notion of Group Types based on the protocol type that the members in the consumer group use. As of now we support two types of groups: * Classic : Members use the classic consumer group protocol ( existing one ) * Consumer : Members use the consumer group protocol introduced in KIP-848. Currently List Groups allows users to list all the consumer groups available. KIP-518 introduced filtering the consumer groups by the state that they are in. We now want to allow users to filter consumer groups by type. This patch includes the changes to the admin client and related files. It also includes changes to parameterize the tests to include permutations of the old GC and the new GC with the different protocol types. Reviewers: David Jacot <[email protected]>

There are a few minor issues with the event sub-classes in the org.apache.kafka.clients.consumer.internals.events package that should be cleaned up: - Update the names of subclasses to remove "Application" or "Background" - Make toString() final in the base classes and clean up the implementations of toStringBase() - Fix minor whitespace inconsistencies - Make variable/method names consistent Reviewer: Bruno Cadonna <[email protected]>

I have noticed the following log when a __consumer_offsets partition immigrate from a broker. It appends because the event is queued up after the event that unloads the state machine. This patch fixes it and fixes another similar one. ``` [2024-02-06 17:14:51,359] ERROR [GroupCoordinator id=1] Execution of UpdateImage(tp=__consumer_offsets-28, offset=13251) failed due to This is not the correct coordinator.. (org.apache.kafka.coordinator.group.runtime.CoordinatorRuntime) org.apache.kafka.common.errors.NotCoordinatorException: This is not the correct coordinator. ``` Reviewers: Justine Olshan <[email protected]>

When the consumer is closed, we perform a sychronous autocommit. We don't want to be woken up here, because we are already executing a close operation under a deadline. This is in line with the behavior of the old consumer. This fixes PlaintextConsumerTest.testAutoCommitOnCloseAfterWakeup which is flaky on trunk - because we return immediately from the synchronous commit with a WakeupException, which causes us to not wait for the commit to finish and thereby sometimes miss the committed offset when a new consumer is created. Reviewers: Matthias J. Sax <[email protected]>, Bruno Cadonna <[email protected]>

…#15440) The internal SubscriptionState object keeps track of whether the assignment is user-assigned, or auto-assigned. If there are no assigned partitions, the assignment resets to NONE. If you call SubscriptionState.assignFromSubscribed in this state it fails. This change makes sure to check SubscriptionState.hasAutoAssignedPartitions() so that assignFromSubscribed is going to be permitted. Also, a minor refactoring to make clearing the subscription a bit easier to follow in MembershipManagerImpl. Testing via new unit test. Reviewers: Bruno Cadonna <[email protected]>, Andrew Schofield <[email protected]>

…s in SASL/OAUTHBEARER (#14818) # Overview * This change pertains to [SASL/OAUTHBEARER ](https://kafka.apache.org/documentation/#security_sasl_oauthbearer) mechanism of Kafka authentication. * Kafka clients can use [SASL/OAUTHBEARER ](https://kafka.apache.org/documentation/#security_sasl_oauthbearer) mechanism by overriding the [custom call back handlers](https://kafka.apache.org/documentation/#security_sasl_oauthbearer_prod) . * [KIP-768](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=186877575) available from v3.1 further extends the mechanism with a production grade implementation. * Kafka's [SASL/OAUTHBEARER ](https://kafka.apache.org/documentation/#security_sasl_oauthbearer) mechanism currently **rejects the non-JWT (i.e. opaque) tokens**. This is because of a more restrictive set of characters than what [RFC-6750](https://datatracker.ietf.org/doc/html/rfc6750#section-2.1) recommends. * This JIRA can be considered an extension of [KIP-768](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=186877575) to support the opaque tokens as well apart from the JWT tokens. # Solution * Have updated the regex in the the offending class to be compliant with the [RFC-6750](https://datatracker.ietf.org/doc/html/rfc6750#section-2.1) * Have provided a supporting test case that includes the possible character set defined in [RFC-6750](https://datatracker.ietf.org/doc/html/rfc6750#section-2.1) --------- Co-authored-by: Anuj Sharma <[email protected]> Co-authored-by: Jamie Holmes <[email protected]> Co-authored-by: Christopher Webb <[email protected]> Reviewers: Manikumar Reddy <[email protected]>, Kirk True <[email protected]>

@provide

This minor pull request consist of upgrading version of jqwik library to version 1.8.0 that brings some bug fixing and some enhancements, upgrading the version now will make future upgrades easier For breaking changes: We are not using ArbitraryConfiguratorBase, so there is no overriding of configure method We are not using TypeUsage.canBeAssignedTo(TypeUsage) No breaking is related to @provide and @forall usage no Exception CannotFindArbitraryException is thrown during tests running No usage of StringArbitrary.repeatChars(0.0) We are not affected by the removal of method TypeArbitrary.use(Executable) We are not affected by the removal or methods ActionChainArbitrary.addAction(action) and ActionChainArbitrary.addAction(weight, action) For more details check the release notes: https://jqwik.net/release-notes.html#180 Reviewers: Chia-Ping Tsai <[email protected]>, Yash Mayya <[email protected]>

Reviewers: Chia-Ping Tsai <[email protected]>

These projects don't actually use easymock/powermock. Reviewers: Chia-Ping Tsai <[email protected]>

Global state stores are currently flushed at each commit, which may impact performance, especially for EOS (commit each 200ms). The goal of this improvement is to flush global state stores only when the delta between the current offset and the last checkpointed offset exceeds a threshold. This is the same logic we apply on local state store, with a threshold of 10000 records. The implementation only flushes if the time interval elapsed and the threshold of 10000 records is exceeded. Reviewers: Jeff Kim <[email protected]>, Bruno Cadonna <[email protected]>

This comment was added by #12862 The method with the comment was originally named updateLastSend, but its name was later changed to onSendAttempt. This method doesn't increment numAttempts. It seems that the numAttempts is only modified after a Request succeeds or fails. Reviewers: Chia-Ping Tsai <[email protected]>

…et (#15426) Currently, in the async Kafka consumer updates to the group metadata that are received by the heartbeat are propagated to the application thread in form of an event. Group metadata is updated when a new assignment is received. The new assignment is directly set in the subscription without sending an update event from the background thread to the application thread. That means that there might be a delay between the application thread being aware of the update to the assignment and the application thread being aware of the update to the group metadata. This delay can cause stale group metadata returned by the application thread that then causes issues when data of the new assignment is committed. A concrete example is producer.sendOffsetsToTransaction(offsetsToCommit, groupMetadata) The offsets to commit might already stem from the new assignment but the group metadata might relate to the previous assignment. Reviewers: Kirk True <[email protected]>, Andrew Schofield <[email protected]>, Lucas Brutschy <[email protected]>

Reviewers: Chia-Ping Tsai <[email protected]>

A foreign-key-join might drop a "subscription response" message, if the value-hash changed. This PR adds support to record such event via the existing "dropped records" sensor. Reviewers: Matthias J. Sax <[email protected]>

…arseString corruption (#15399) * KAFKA-16288: Prevent ClassCastExceptions for strings in Values.convertToDecimal * KAFKA-16289: Values inferred schemas for map and arrays should ignore element order Signed-off-by: Greg Harris <[email protected]> Reviewers: Chris Egerton <[email protected]>

@throws

…llback (#15437) The javadocs for commitAsync() (w/o callback) say: @throws org.apache.kafka.common.errors.FencedInstanceIdException if this consumer instance gets fenced by broker. If no callback is passed into commitAsync(), no offset commit callback invocation is submitted. However, we only check for a FencedInstanceIdException when we execute a callback. When the consumer gets fenced by another consumer with the same group.instance.id, and we do not use a callback, we miss the exception. This change modifies the behavior to propagate the FencedInstanceIdException even if no callback is used. The code is kept very similar to the original consumer. We also change the order - first try to throw the fenced exception, then execute callbacks. That is the order in the original consumer so it's safer to keep it this way. For testing, we add a unit test that verifies that the FencedInstanceIdException is thrown in that case. Reviewers: Philip Nee <[email protected]>, Matthias J. Sax <[email protected]>

In order to move ConfigCommand to tools we must move all it's dependencies which includes KafkaConfig and other core classes to java. This PR moves log cleaner configuration to CleanerConfig class of storage module. Reviewers: Chia-Ping Tsai <[email protected]>

This pr parameterize some group ids in GroupMetadataManagerTestContext that are now constant strings. Reviewers: Chia-Ping Tsai <[email protected]>

Remove the test constructor for PartitionAssignment and remove the TODO. Also add KRaftClusterTest.testCreatePartitions to get more coverage for createPartitions. Reviewers: David Arthur <[email protected]>, Chia-Ping Tsai <[email protected]>

It seems likely that BrokerServer was built upon the KafkaServer codebase.(#10113) KafkaServer, using Zookeeper, separates controlPlane and dataPlane to implement KIP-291. In KRaft, the roles of DataPlane and ControlPlane in KafkaServer seem to be divided into BrokerServer and ControllerServer. It appears that the initial implementation of BrokerServer initialized and used the controlPlaneRequestProcessor, but it seems to have been removed, except for the code used in the shutdown method, through subsequent modifications.(#10931) Reviewers: Chia-Ping Tsai <[email protected]>

…ore v2.8 (#15444) Change the function with a better way to deal with the NULL pointer exception. Reviewers: Luke Chen <[email protected]>

…s (#14426) Kafka Streams support asymmetric join windows. Depending on the window configuration we need to compute window close time etc differently. This PR flips `joinSpuriousLookBackTimeMs`, because they were not correct, and introduced the `windowsAfterIntervalMs`-field that is used to find if emitting records can be skipped. Reviewers: Hao Li <[email protected]>, Guozhang Wang <[email protected]>, Matthias J. Sax <[email protected]>

Reviewers: Luke Chen <[email protected]>, Chia-Ping Tsai <[email protected]>

…365) Is contains some of ConsoleGroupCommand tests rewritten in java. Intention of separate PR is to reduce changes and simplify review. Reviewers: Chia-Ping Tsai <[email protected]>

This should help us avoid testing MVs before they are usable (stable). We revert back from testing 3.8 in this case since 3.7 is the current stable version. Reviewers: Proven Provenzano <[email protected]>, Justine Olshan <[email protected]>

…#15254) This pull requests migrates the StateDirectory mock in TaskManagerTest from EasyMock to Mockito. The change is restricted to a single mock to minimize the scope and make it easier for review. Reviewers: Ismael Juma <[email protected]>, Bruno Cadonna <[email protected]>

Upgrading the test to use the consumer group protocol. The two tests are failing due to Mismatch Assignment Reviewers: Lucas Brutschy <[email protected]>

@matrix

…protocol config (#15577) Added a new optional group_protocol parameter to the test methods, then passed that down to the methods involved. Unfortunately, because the new consumer can only be used with the new coordinator, this required a new @matrix block instead of adding the group_protocol=["classic", "consumer"] to the existing blocks 😢 Reviewers: Lucas Brutschy <[email protected]>

@matrix

…rotocol config (#15567) Added a new optional group_protocol parameter to the test methods, then passed that down to the methods involved. Unfortunately, because the new consumer can only be used with the new coordinator, this required a new @matrix block instead of adding the group_protocol=["classic", "consumer"] to the existing blocks 😢 Reviewers: Lucas Brutschy <[email protected]>

As a part of KIP-890, we are introducing a new class of Exceptions which when encountered shall lead to Aborting the ongoing Transaction. The following PR introduces the same with client side handling and server side changes. On client Side, the code attempts to handle the exception as an Abortable error and ensure that it doesn't take the producer to a fatal state. For each of the Transactional APIs, we have added the appropriate handling. For the produce request, we have verified that the exception transitions the state to Aborted. On the server side, we have bumped the ProduceRequest, ProduceResponse, TxnOffestCommitRequest and TxnOffsetCommitResponse Version. The appropriate handling on the server side has been added to ensure that the new error case is sent back only for the new clients. The older clients will continue to get the old Invalid_txn_state exception to maintain backward compatibility. Reviewers: Calvin Liu <[email protected]>, Justine Olshan <[email protected]>

…cross threads (#15550) Reviewers: vamossagar12 <[email protected]>, Chia-Ping Tsai <[email protected]>

Reviewers: Chia-Ping Tsai <[email protected]>

…aVersionTest.testFromVersionString (#15563) Reviewers: Chia-Ping Tsai <[email protected]>

…on or --help (#15583) Reviewers: Chia-Ping Tsai <[email protected]>

… exit code on error (#15591) Reviewers: Chia-Ping Tsai <[email protected]>

…5534) When the group coordinator is under heavy load, the current mechanism to release pending events based on updated high watermark, which consist in pushing an event at the end of the queue, is bad because pending events pay the cost of the queue twice. A first time for the handling of the first event and a second time for the handling of the hwm update. This patch changes this logic to push the hwm update event to the front of the queue in order to release pending events as soon as as possible. Reviewers: Jeff Kim <[email protected]>, Justine Olshan <[email protected]>

Reviewers: Mickael Maison <[email protected]>

This PR includes a fix to properly identify a reconciliation that should be interrupted and not applied because the member has rejoined. It does so simply based on a flag (not epochs, server or local). If the member has rejoined while reconciling, the reconciliation will be interrupted. This also ensures that the check to abort the reconciliation is performed on all the 3 stages of the reconciliation that could be delayed: commit, onPartitionsRevoked, onPartitionsAssigned. Reviewers: David Jacot <[email protected]>, Lucas Brutschy <[email protected]>

Splitting consumer integration tests to allow for parallelization and reduce build times. This PR is only extracting tests from PlainTextConsumerTest into separate files, no changes in logic. Grouping tests by the feature they relate to so that they can be easily found Reviewers: Andrew Schofield <[email protected]>, Lucas Brutschy <[email protected]>

In between HeartbeatRequest being sent and the response being handled, i.e. while a HeartbeatRequest is in flight, an extra request may be immediately scheduled if propagateDirectoryFailure, setReadyToUnfence, or beginControlledShutdown is called. To prevent the extra request, we can avoid the extra requests by checking whether a request is in flight, and delay the scheduling if necessary. Some of the tests in BrokerLifecycleManagerTest are also improved to remove race conditions and reduce flakiness. Reviewers: Colin McCabe <[email protected]>, Ron Dagostino <[email protected]>, Jun Rao <[email protected]>

…15581) Current logic for auto-committing offsets when partitions are revoked will retry continuously when getting UNKNOWN_TOPIC_OR_PARTITION, leading to the member not completing the revocation in time. This commit considers error UNKNOWN_TOPIC_OR_PARTITION to be fatal in the context of an auto-commit of offsets before a revocation, even though the error is defined as retriable. This ensures that the revocation can finish in time. Reviewers: Andrew Schofield <[email protected]>, Lucas Brutschy <[email protected]>, Lianet Magrans <[email protected]>

…fication (#15559) KIP-890 Part 1 introduced verification of transactions with the transaction coordinator on the `Produce` and `TxnOffsetCommit` paths. This introduced the possibility of new errors when responding to those requests. For backwards compatibility with older clients, a choice was made to convert some of the new retriable errors to existing errors that are expected and retried correctly by older clients. `NETWORK_EXCEPTION` was forgotten about and not converted, but can occur if, for example, the transaction coordinator is temporarily refusing connections. Now, we convert it to: * `NOT_ENOUGH_REPLICAS` on the `Produce` path, just like the other retriable errors that can arise from transaction verification. * `COORDINATOR_LOAD_IN_PROGRESS` on the `TxnOffsetCommit` path. This error does not force coordinator lookup on clients, unlike `COORDINATOR_NOT_AVAILABLE`. Note that this deviates from KIP-890, which says that retriable errors should be converted to `COORDINATOR_NOT_AVAILABLE`. Reviewers: Artem Livshits <[email protected]>, David Jacot <[email protected]>, Justine Olshan <[email protected]>

…ing (#15586) DeleteRecordsCommand should use standard exception handling Reviewers: Luke Chen <[email protected]>

…d (#15592) Reviewers: Chia-Ping Tsai <[email protected]>

Signed-off-by: Robin Han <[email protected]>

CLAassistant · 2024-03-26T12:42:06Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 14 committers have signed the CLA.

✅ superhx
❌ johnnychhsu
❌ kirktrue
❌ nizhikov
❌ sjhajharia
❌ lianetm
❌ VedarthConfluent
❌ wernerdv
❌ brandboat
❌ dajac
❌ cadonna
❌ squah-confluent
❌ soarez
❌ FrankYang0529
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

Signed-off-by: Robin Han <[email protected]>

daniel-y

LGTM～

gaurav-narula and others added 30 commits February 28, 2024 09:37

MINOR: Optimize EventAccumulator (#15430)

52289c9

`poll(long timeout, TimeUnit unit)` is either used with `Long.MAX_VALUE` or `0`. This patch replaces it with `poll` and `take`. It removes the `awaitNanos` usage. Reviewers: Jeff Kim <[email protected]>, Justine Olshan <[email protected]>

MINOR: Remove the space between two words (#15439)

1bb9a85

Remove the space between two words Reviewers: Luke Chen <[email protected]>

MINOR: fix link for ListTransactionsOptions#filterOnDuration (#15459)

44af72f

Reviewers: Chia-Ping Tsai <[email protected]>

MINOR: fix SessionStore java doc (#15412)

907e945

Reviewers: Chia-Ping Tsai <[email protected]>

MINOR: Remove unnecessary easymock/powermock dependencies (#15460)

2c0cab3

These projects don't actually use easymock/powermock. Reviewers: Chia-Ping Tsai <[email protected]>

MINOR: simplify ensure topic exists condition (#15458)

c254b22

Reviewers: Chia-Ping Tsai <[email protected]>

MINOR: parameterize group-id in GroupMetadataManagerTestContext (#15467)

2f401ff

This pr parameterize some group ids in GroupMetadataManagerTestContext that are now constant strings. Reviewers: Chia-Ping Tsai <[email protected]>

KAFKA-16209 : fetchSnapshot might return null if topic is created bef…

554fa57

…ore v2.8 (#15444) Change the function with a better way to deal with the NULL pointer exception. Reviewers: Luke Chen <[email protected]>

KAFKA-16347: Upgrade zookeeper 3.8.3 -> 3.8.4 (#15480)

ae047bb

Reviewers: Luke Chen <[email protected]>, Chia-Ping Tsai <[email protected]>

KAFKA-14589 [3/4] Tests of ConsoleGroupCommand rewritten in java (#15…

f6198bc

…365) Is contains some of ConsoleGroupCommand tests rewritten in java. Intention of separate PR is to reduce changes and simplify review. Reviewers: Chia-Ping Tsai <[email protected]>

ahuang98 and others added 21 commits March 21, 2024 11:06

KAFKA-16271: Upgrade consumer_rolling_upgrade_test.py (#15578)

fb03da0

Upgrading the test to use the consumer group protocol. The two tests are failing due to Mismatch Assignment Reviewers: Lucas Brutschy <[email protected]>

KAFKA-16381 use volatile to guarantee KafkaMetric#config visibility a…

6eba905

…cross threads (#15550) Reviewers: vamossagar12 <[email protected]>, Chia-Ping Tsai <[email protected]>

MINOR: Tuple2 replaced with Map.Entry (#15560)

0f216b6

Reviewers: Chia-Ping Tsai <[email protected]>

KAFKA-16388 add production-ready test of 3.3 - 3.6 release to Metadat…

bf9a27f

…aVersionTest.testFromVersionString (#15563) Reviewers: Chia-Ping Tsai <[email protected]>

KAFKA-16408 kafka-get-offsets / GetOffsetShell doesn't handle --versi…

0434c29

…on or --help (#15583) Reviewers: Chia-Ping Tsai <[email protected]>

KAFKA-16410 kafka-leader-election / LeaderElectionCommand doesn't set…

7b2fc46

… exit code on error (#15591) Reviewers: Chia-Ping Tsai <[email protected]>

KAFKA-15882: Add nightly docker image scan job (#15013)

e9e007a

Reviewers: Mickael Maison <[email protected]>

KAFKA-16409: DeleteRecordsCommand should use standard exception handl…

fa1cf79

…ing (#15586) DeleteRecordsCommand should use standard exception handling Reviewers: Luke Chen <[email protected]>

KAFKA-16415 Fix handling of "--version" option in ConsumerGroupComman…

2d4abb8

…d (#15592) Reviewers: Chia-Ping Tsai <[email protected]>

fix(all): fix merge conflict

daeaabe

Signed-off-by: Robin Han <[email protected]>

superhx requested review from SCNieh, ShadowySpirits, Chillax-0v0 and daniel-y March 26, 2024 12:41

fix(test): fix ElasticUnifiedLog test

6e1edaf

Signed-off-by: Robin Han <[email protected]>

daniel-y approved these changes Mar 26, 2024

View reviewed changes

superhx merged commit 49eb9cd into main Mar 26, 2024
6 of 7 checks passed

superhx deleted the merge_trunk branch March 26, 2024 13:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: merge apache kafka trunk #1030

feat: merge apache kafka trunk #1030

superhx commented Mar 26, 2024

CLAassistant commented Mar 26, 2024 •

edited

Loading

daniel-y left a comment

feat: merge apache kafka trunk #1030

feat: merge apache kafka trunk #1030

Conversation

superhx commented Mar 26, 2024

CLAassistant commented Mar 26, 2024 • edited Loading

daniel-y left a comment

Choose a reason for hiding this comment

CLAassistant commented Mar 26, 2024 •

edited

Loading