[Bug][Producer]Forget to consume the sendRequests before closing the producer #1042

gunli · 2023-06-28T03:16:54Z

Expected behavior

Consume the sendRequests before closing the producer, send and flush them, invoke the callbacks of the input messages, so that the application can know the producing is succeed or failed.

Actual behavior

Currently, when we close the producer, we forget to consume the sendRequests in partitionProducer.dataChan before closing.
In the case that producing is faster than consuming in partitionProducer, we have send a lot of sendRequests into partitionProducer.dataChan, when closing, many of them are not consumed, these sendRequests will looks like get lost, their callback won't get invoked, the application won't know the producing result of these messages.

Steps to reproduce

Review the code of partitionProducer

func (p *partitionProducer) internalSendAsync(ctx context.Context, msg *ProducerMessage,
	callback func(MessageID, *ProducerMessage, error), flushImmediately bool) {
	//Register transaction operation to transaction and the transaction coordinator.
	var newCallback func(MessageID, *ProducerMessage, error)
	if msg.Transaction != nil {
		transactionImpl := (msg.Transaction).(*transaction)
		if transactionImpl.state != TxnOpen {
			p.log.WithField("state", transactionImpl.state).Error("Failed to send message" +
				" by a non-open transaction.")
			callback(nil, msg, newError(InvalidStatus, "Failed to send message by a non-open transaction."))
			return
		}

		if err := transactionImpl.registerProducerTopic(p.topic); err != nil {
			callback(nil, msg, err)
			return
		}
		if err := transactionImpl.registerSendOrAckOp(); err != nil {
			callback(nil, msg, err)
		}
		newCallback = func(id MessageID, producerMessage *ProducerMessage, err error) {
			callback(id, producerMessage, err)
			transactionImpl.endSendOrAckOp(err)
		}
	} else {
		newCallback = callback
	}
	if p.getProducerState() != producerReady {
		// Producer is closing
		newCallback(nil, msg, errProducerClosed)
		return
	}

	// bc only works when DisableBlockIfQueueFull is false
	bc := make(chan struct{})

	// callbackOnce make sure the callback is only invoked once in chunking
	callbackOnce := &sync.Once{}
	var txn *transaction
	if msg.Transaction != nil {
		txn = (msg.Transaction).(*transaction)
	}
	sr := &sendRequest{
		ctx:              ctx,
		msg:              msg,
		callback:         newCallback,
		callbackOnce:     callbackOnce,
		flushImmediately: flushImmediately,
		publishTime:      time.Now(),
		blockCh:          bc,
		closeBlockChOnce: &sync.Once{},
		transaction:      txn,
	}
	p.options.Interceptors.BeforeSend(p, msg)

	p.dataChan <- sr

	if !p.options.DisableBlockIfQueueFull {
		// block if queue full
		<-bc
	}
}

func (p *partitionProducer) internalClose(req *closeProducer) {
	defer close(req.doneCh)
	if !p.casProducerState(producerReady, producerClosing) {
		return
	}

	p.log.Info("Closing producer")

	id := p.client.rpcClient.NewRequestID()
	_, err := p.client.rpcClient.RequestOnCnx(p._getConn(), id, pb.BaseCommand_CLOSE_PRODUCER, &pb.CommandCloseProducer{
		ProducerId: &p.producerID,
		RequestId:  &id,
	})

	if err != nil {
		p.log.WithError(err).Warn("Failed to close producer")
	} else {
		p.log.Info("Closed producer")
	}

	if p.batchBuilder != nil {
		if err = p.batchBuilder.Close(); err != nil {
			p.log.WithError(err).Warn("Failed to close batch builder")
		}
	}

	p.setProducerState(producerClosed)
	p._getConn().UnregisterListener(p.producerID)
	p.batchFlushTicker.Stop()
}

And in the interface of the Producer, we saied Waits until all pending write request are persisted. In case of errors, pending writes will not be retried., I think the messages that have been summit to SendAsync should be treated as pending write requests, so we should send and flush them to make sure they are done with a result.

// Producer is used to publish messages on a topic
type Producer interface {
	...

	// Close the producer and releases resources allocated
	// No more writes will be accepted from this producer. Waits until all pending write request are persisted. In case
	// of errors, pending writes will not be retried.
	Close()
}

System configuration

Pulsar version: x.y

@zengguan @merlimat @wolfstudy

The text was updated successfully, but these errors were encountered:

gunli mentioned this issue Jul 4, 2023

Add 0.11.0 change log #1048

Merged

gunli mentioned this issue Jul 18, 2023

[BUG] The producer flush opertion is not guarantee to flush all messages #1057

Closed

graysonzeng mentioned this issue Jul 18, 2023

[Fix][producer] Fail all messages that are pending requests when closing like java #1059

Merged

tisonkun closed this as completed in #1059 Aug 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug][Producer]Forget to consume the sendRequests before closing the producer #1042

[Bug][Producer]Forget to consume the sendRequests before closing the producer #1042

gunli commented Jun 28, 2023 •

edited

Loading

[Bug][Producer]Forget to consume the sendRequests before closing the producer #1042

[Bug][Producer]Forget to consume the sendRequests before closing the producer #1042

Comments

gunli commented Jun 28, 2023 • edited Loading

Expected behavior

Actual behavior

Steps to reproduce

System configuration

gunli commented Jun 28, 2023 •

edited

Loading