-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[improve][pip] PIP-372: Allow consumers to terminally NACK a message, bypassing retries and routing directly to the DLQ #23133
Open
jcass8695
wants to merge
5
commits into
apache:master
Choose a base branch
from
jcass8695:add/pip-367
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 4 commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
d8d7b64
add: first draft of PIP-367
jcass8695 0e01053
Merge branch 'apache:master' into add/pip-367
jcass8695 a0f4775
add: first draft of PIP-367
jcass8695 ecb1b1b
Supplement java client
crossoverJie b909a16
refactor: pip number from 367 to 372 to avoid clash
jcass8695 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,193 @@ | ||
# PIP-367: Allow consumers to terminally NACK a message, bypassing retries and routing directly to the DLQ | ||
|
||
# Background knowledge | ||
|
||
A consumer has two options when it has finished processing a message: | ||
1. [Acknowledge](https://pulsar.apache.org/docs/3.3.x/concepts-messaging/#acknowledgment) (ACK) the message, signalling | ||
to the broker that this message has been successfully processed and should not be sent again. | ||
2. [Negative Acknowledge](https://pulsar.apache.org/docs/3.3.x/concepts-messaging/#negative-acknowledgment) (NACK) the | ||
message which, after a configured delay, signals to the broker to redeliver the message. | ||
|
||
A consumer can NACK the same message up to a configured maximum number of times. After this maximum number of retries | ||
has been reached, the consumer will attempt to ACK the message and then produce that same message to a [dead letter | ||
topic](https://pulsar.apache.org/docs/3.3.x/concepts-messaging/#dead-letter-topic) (DLQ). Messages in the DLQ can be | ||
consumed and processed 'out of band', e.g. for later manual inspection. | ||
|
||
# Motivation | ||
|
||
If a consumer consumes a message and fails during the processing of that message in a terminal way, we as the client in | ||
certain scenarios may not want to retry that message later, we may want to send it directly to the DLQ for later | ||
inspection. | ||
|
||
Example: A message is received that contains an invalid field value. The business logic dictates that we should fail | ||
processing this message and not retry, however this invalid value might be unexpected and indicate a bug in the system. | ||
It would be useful to have the message routed to the DLQ so that an engineer can reconsume it 'offline' at a later date | ||
to investigate. | ||
|
||
Currently, if the client wanted a persistent record of the message, they would have to do one of: | ||
|
||
1. NACK the message and retry it up to the configured max retries, when it would then land in the DLQ. | ||
2. ACK the message and explicity produce it to the DLQ. | ||
3. ACK the message and persist it in some other system, e.g. a database table, a log file. | ||
|
||
# Goals | ||
|
||
## In Scope | ||
|
||
* Add an interface that allows a consumer to terminally NACK a message, sending it directly to the DLQ instead of | ||
retrying. | ||
|
||
## Out of Scope | ||
|
||
N/A | ||
|
||
|
||
# High Level Design | ||
|
||
Omitted as the detailed design is not overly complex. | ||
|
||
# Detailed Design | ||
|
||
## Design & Implementation Details | ||
|
||
This change could be achieved by changes to the client libraries only, reusing existing functionality. I propose adding | ||
a public method to the Consumer interface `Term`. `Term` would bypass the check between the number of message | ||
redeliveries and the configured maximum and route the message directly to the DLQ. | ||
|
||
Here is an example of the proposed interface change in Go and Java client. | ||
```go | ||
type Consumer interface { | ||
// Term acknowledges the terminal, non-retryable failure to process a single message. | ||
// | ||
// When a message is "terminally negatively acked" it will not be marked for redelivery. Instead, it will be routed | ||
// directly to the DLQ in accordance with the configured [DLQPolicy]. If there is no DLQPolicy configured on the | ||
// consumer, this method will error. | ||
Term(Message) error | ||
} | ||
``` | ||
|
||
```java | ||
public interface Consumer<T> { | ||
|
||
/** | ||
* Non-retryable failure to process a single message. | ||
* <p>When a message is "terminally negatively acked" it will not be marked for redelivery. Instead, | ||
* it will be routed directly to the DLQ in accordance with the configured {@link DeadLetterPolicy} | ||
* | ||
* @param messageId {@link MessageId} to be individual acknowledged | ||
* @throws PulsarClientException.InvalidConfigurationException | ||
* if the consumer did not set the {@link DeadLetterPolicy} | ||
*/ | ||
void terminate(MessageId messageId) throws PulsarClientException; | ||
|
||
void terminate(Message<?> message) throws PulsarClientException; | ||
} | ||
|
||
``` | ||
|
||
Here is the existing flow that a Consumer undergoes in order to route a message to the DLQ. | ||
```mermaid | ||
sequenceDiagram | ||
participant Broker | ||
participant Consumer | ||
loop | ||
Broker->>Consumer: Receive message A | ||
alt attempts > max? | ||
Consumer->>Broker: ACK | ||
Consumer->>Broker: Produce message A to DLQ | ||
Note right of Broker: break | ||
else | ||
Consumer->>Consumer: Process message A | ||
Note right of Consumer: Processing message A fails | ||
Consumer->>Consumer: NACK | ||
Consumer->>Broker: Resend message A | ||
Broker->>Broker: attempts++ | ||
end | ||
end | ||
``` | ||
|
||
Here is the same flow augmented to allow a Consumer to terminally fail a message and route it to the DLQ without having | ||
to retry up to several times. | ||
```mermaid | ||
sequenceDiagram | ||
participant Broker | ||
participant Consumer | ||
loop | ||
Broker->>Consumer: Receive message A | ||
alt attempts > max? | ||
Consumer->>Broker: ACK | ||
Consumer->>Broker: Produce message A to DLQ | ||
Note right of Broker: break | ||
else | ||
Consumer->>Consumer: Process message A | ||
Note right of Consumer: Processing message A fails | ||
alt Terminal failure? | ||
Consumer->>Consumer: TERM | ||
Consumer->>Broker: ACK | ||
Consumer->>Broker: Produce message A to DLQ | ||
Note right of Broker: break | ||
else | ||
Consumer->>Consumer: NACK | ||
Consumer->>Broker: Resend message A | ||
Broker->>Broker: attempts++ | ||
end | ||
end | ||
end | ||
``` | ||
|
||
## Public-facing Changes | ||
|
||
### Public API | ||
|
||
N/A | ||
|
||
### Binary protocol | ||
|
||
N/A | ||
|
||
### Configuration | ||
|
||
N/A | ||
|
||
### CLI | ||
|
||
N/A | ||
|
||
### Metrics | ||
|
||
If the client would like to track the number of calls to the `Term` method, we could add a new metric | ||
`pulsar_client_consumer_terms`. This metric should in theory track exactly with the | ||
`pulsar_client_consumer_dlq_messages`, but it may be desirable to have separate metrics. | ||
|
||
* Name: `pulsar_client_consumer_terms` | ||
* Description: Counter of messages termed by the client | ||
* Attributes: `client`, `pulsar_tenant`, `pulsar_namespace`, `topic` | ||
* Unit: N/A | ||
|
||
> Keep the Java client consistent. | ||
|
||
# Monitoring | ||
|
||
N/A | ||
|
||
# Security Considerations | ||
|
||
N/A | ||
|
||
# Backward & Forward Compatibility | ||
|
||
This proposal includes adding a new method to the Consumer interface in the client libraries, not changing any existing | ||
interface methods therefore this change is backwards and forwards compatible. | ||
|
||
# Alternatives | ||
|
||
N/A | ||
|
||
# General Notes | ||
|
||
N/A | ||
|
||
# Links | ||
|
||
* Mailing List discussion thread: | ||
* Mailing List voting thread: |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pulsar_client_consumer_message_terms
It is recommended to keep the naming conventions consistent.
pulsar/pulsar-client/src/main/java/org/apache/pulsar/client/impl/ConsumerImpl.java
Lines 429 to 430 in d4bbf10
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are all defined with underscores in the Go client library.
https://github.com/apache/pulsar-client-go/blob/master/pulsar/internal/metrics.go#L266-L270
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Java client is currently using OpenTelemetry's naming conventions. I suggest aligning it with the existing naming style.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that moving the Go client to be OTel compliant is outside the scope of this change. I think I would rather the Go client be consistent with itself. Changing all of the metrics to be OTel compliant can be done in a single non-backwards compatible change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I meant that the Java client should follow the current naming conventions. It would be best to also add a description for Java metrics.