Skip to content

Commit

Permalink
Fix for idempotent producer fatal errors, triggered after a possibly …
Browse files Browse the repository at this point in the history
…persisted message state (#4438)
  • Loading branch information
emasab authored and anchitj committed Nov 26, 2024
1 parent e75de5b commit 554285f
Show file tree
Hide file tree
Showing 6 changed files with 407 additions and 13 deletions.
17 changes: 16 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,8 @@ librdkafka v2.2.0 is a feature release:
Add DNS alias support for secured connection (#4292).
* [KIP-339](https://cwiki.apache.org/confluence/display/KAFKA/KIP-339%3A+Create+a+new+IncrementalAlterConfigs+API):
IncrementalAlterConfigs API (started by @PrasanthV454, #4110).
* [KIP-554](https://cwiki.apache.org/confluence/display/KAFKA/KIP-554%3A+Add+Broker-side+SCRAM+Config+API): Add Broker-side SCRAM Config API (#4241).
* [KIP-554](https://cwiki.apache.org/confluence/display/KAFKA/KIP-554%3A+Add+Broker-side+SCRAM+Config+API): Add Broker-side SCRAM Config API (#4241).
* Fix for idempotent producer fatal errors, triggered after a possibly persisted message state (#4438).


## Enhancements
Expand Down Expand Up @@ -72,6 +73,20 @@ librdkafka v2.2.0 is a feature release:
assignment completely.


### Idempotent producer fixes

* After a possibly persisted error, such as a disconnection or a timeout, next expected sequence
used to increase, leading to a fatal error if the message wasn't persisted and
the second one in queue failed with an `OUT_OF_ORDER_SEQUENCE_NUMBER`.
The error could contain the message "sequence desynchronization" with
just one possibly persisted error or "rewound sequence number" in case of
multiple errored messages.
Solved by treating the possible persisted message as _not_ persisted,
and expecting a `DUPLICATE_SEQUENCE_NUMBER` error in case it was or
`NO_ERROR` in case it wasn't, in both cases the message will be considered
delivered (#4438).



# librdkafka v2.1.1

Expand Down
19 changes: 7 additions & 12 deletions src/rdkafka_request.c
Original file line number Diff line number Diff line change
Expand Up @@ -3336,17 +3336,12 @@ static int rd_kafka_handle_Produce_error(rd_kafka_broker_t *rkb,
* which should not be treated as a fatal error
* since this request and sub-sequent requests
* will be retried and thus return to order.
* Unless the error was a timeout, or similar,
* in which case the request might have made it
* and the messages are considered possibly persisted:
* in this case we allow the next in-flight response
* to be successful, in which case we mark
* this request's messages as succesfully delivered. */
if (perr->status &
RD_KAFKA_MSG_STATUS_POSSIBLY_PERSISTED)
perr->update_next_ack = rd_true;
else
perr->update_next_ack = rd_false;
* In case the message is possibly persisted
* we still treat it as not persisted,
* expecting DUPLICATE_SEQUENCE_NUMBER
* in case it was persisted or NO_ERROR in case
* it wasn't. */
perr->update_next_ack = rd_false;
perr->update_next_err = rd_true;

/* Drain outstanding requests so that retries
Expand Down Expand Up @@ -3627,7 +3622,7 @@ static void rd_kafka_msgbatch_handle_Produce_result(
.err = err,
.incr_retry = 1,
.status = status,
.update_next_ack = rd_true,
.update_next_ack = rd_false,
.update_next_err = rd_true,
.last_seq = (batch->first_seq +
rd_kafka_msgq_len(&batch->msgq) - 1)};
Expand Down
Loading

0 comments on commit 554285f

Please sign in to comment.