-
Notifications
You must be signed in to change notification settings - Fork 346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SendAsync() can stall with large BatchingMaxPublishDelay #437
Comments
Profiling the goroutines in my test program, at the time the send is stalled, results in this. Note that
In the creation of
I am unclear why this semaphore size is inadequate, but this seems like it might be the source of the issue. Is there perhaps a case where the semaphore can fill before the message-count-based batch flush actually occurs, so that the corresponding release in |
Currently, in batch_builder.go, If we change that, i.e. edit
to
...then the problem also seems to go away, without changing It could be the case that increasing However, I'm not confident enough that this is the correct fix to send a PR for it. Would appreciate @merlimat or @wolfstudy taking a look if you have time. |
I worked around this bug by setting This seems to give me throughput comparable to, or better than, the cgo client. |
I believe this is probably fixed by #528, and so I am closing. |
Observed behavior
When using
SendAsync()
together with a largeBatchingMaxPublishDelay
, such that batch flushing is driven mainly byBatchingMaxMessages
, send stalls can occur.I don't fully understand the cause of these, but increasing
MaxPendingMessages
seems to make them go away.It may be relevant that I am producing to a topic with several partitions.
I originally came across this problem because I recently moved from the cgo client, and for the same configuration the pure go client was exhibiting very much worse throughput characteristics for high-rate sends: I was getting a max of ~6k records/sec when the cgo client had been giving me ~50k/sec.
Steps to reproduce
Create a producer with a large
BatchingMaxPublishDelay
and the other values default, e.g.Enable debug logging and produce to a partitioned topic with a reasonable number of partitions (in my case, six), using
SendAsync()
, with a callback function set (callback does nothing except crash on error). Note that the debug log will frequently stall after aReceived send request
message, and pause until a flush initiated by the max publish delay occurs.Increase
MaxPendingMessages
to 2000 and try again. The stalls now go away.Increase
MaxPendingMessages
to 6000 and try again. The stalls are still gone, and throughput perhaps appears better than the case above.System configuration
Pulsar version: 2.6.1
Client version: 71cc54f (current master)
The text was updated successfully, but these errors were encountered: