Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ISSUE-437: SendAsync() can stall with large BatchingMaxPublishDelay #136

Closed
sijie opened this issue Jan 8, 2021 · 0 comments
Closed

ISSUE-437: SendAsync() can stall with large BatchingMaxPublishDelay #136

sijie opened this issue Jan 8, 2021 · 0 comments

Comments

@sijie
Copy link
Member

sijie commented Jan 8, 2021

Original Issue: apache#437


Observed behavior

When using SendAsync() together with a large BatchingMaxPublishDelay, such that batch flushing is driven mainly by BatchingMaxMessages, send stalls can occur.

I don't fully understand the cause of these, but increasing MaxPendingMessages seems to make them go away.

It may be relevant that I am producing to a topic with several partitions.

I originally came across this problem because I recently moved from the cgo client, and for the same configuration the pure go client was exhibiting very much worse throughput characteristics for high-rate sends: I was getting a max of ~6k records/sec when the cgo client had been giving me ~50k/sec.

Steps to reproduce

Create a producer with a large BatchingMaxPublishDelay and the other values default, e.g.

pulsar.ProducerOptions{
  Topic:                   topic,
  CompressionType:         pulsar.ZLib,
  BatchingMaxPublishDelay: 100 * time.Second,
}

Enable debug logging and produce to a partitioned topic with a reasonable number of partitions (in my case, six), using SendAsync(), with a callback function set (callback does nothing except crash on error). Note that the debug log will frequently stall after a Received send request message, and pause until a flush initiated by the max publish delay occurs.

Increase MaxPendingMessages to 2000 and try again. The stalls now go away.

Increase MaxPendingMessages to 6000 and try again. The stalls are still gone, and throughput perhaps appears better than the case above.

System configuration

Pulsar version: 2.6.1
Client version: 71cc54f (current master)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants