-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance issue with producer - low throughput #3328
Comments
Are you perhaps doing synchronous producing, that is, you wait for each produce() to complete by calling flush() or poll(long_timeout)? |
Thanks for the quick response @edenhill . I am not doing synchronous produce. My producer application has a separate polling thread that polls indefinitely: Apart from this no poll (or flush()) done in my main thread that produce() the records. |
What configuration properties are you setting? (it is quite a bit of work to see which ones are set to non-default values in the config dump above) Are there any process limits in play, perhaps number of concurrent threads running? |
These are the only configurations set: %7|1617777172.751|CONF|rdkafka#producer-1| [thrd::0/internal]: Client configuration: I checked my system configuration for process limits, current configuration can go upto 1500 concurrent threads: CURRENT UNIX CONFIGURATION SETTINGS: |
|
Oh, you are setting
That will not work great since the client will not receive much back pressure or pacing from the broker but just blast away messages as quickly as it can, supposedly faster than the broker can handle. |
ack=1 did not yield much throughput gain either. It is still around 1724 records per second. |
Recommend removing "batch.num.messages", it is better for the client to fill batches as large as possible. |
You can also reduce linger.ms to 100 ms |
No luck still, we came down to ~1100 rps. |
When the socket buffers become full there's this 1s delay between sends:
Are you comfortable with rebuilding librdkafka yourself? |
That did the trick! I see the throughput went up to ~ 25K to 33K rps. Now, Will it make sense to change max_block time (rd_kafka_max_block_ms) configurable by client ? Thanks for your precious time on this @edenhill |
Awesome! The max_block_ms is a safety harness which shouldn't really be used, it is just there to make sure we don't over-poll or under-poll, but the code should wake up automatically when there are more messages. |
Indeed! - The issue can be closed. Thanks again! |
Dup of #2912 |
@sendaun Could you try the https://github.com/edenhill/librdkafka/tree/nokinitinnew branch? (without reducing max_block_ms) |
Sure will test and keep you posted on the result. |
Hi @edenhill , appears does not fix the problem. Tested with multiple runs but unable to cross 1500 records per sec. |
Hey, can you try to reproduce on this branch? https://github.com/edenhill/librdkafka/tree/qwakeupfix |
Unable to locate the branch, Should I try master? |
Yes please
ons 14 apr. 2021 kl. 17:06 skrev sendaun ***@***.***>:
… Unable to locate the branch, Should I try master?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3328 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAEAFPXXLNIEVAXJT2PDM3LTIWVONANCNFSM42MIPW7A>
.
|
Description
My application has a throughput requirement of at least 5000 records per second (with records ranging from 200 to 300 bytes each). In this regard, I have been testing librdkafka throughput for past one week by tweaking various configuration parameters. So far the max through put i could witness is ~ 1560 records per second (no where near the performance numbers I read here).
Few observations to share:
(1) As suggested in various forums, I tried with various batch sizes and queue.buffering.max.ms (between 1ms to 1000ms), but hardly any difference in the throughput.
(2) The statistics shows significantly high internal latency. [Attached]
librd_stat.txt
(3) Adding a another producer instance, shows almost double the through put (~2k to 2.5k records per second)
(4) As observed in this issue: #2356, I also notice that reducing the record size from 300 bytes to 30 bytes shows a drastic increase in the throughput numbers (don't have the exact numbers, but significantly high)
Checklist
IMPORTANT: We will close issues where the checklist has not been completed.
Please provide the following information:
v1.6.1
2.7.0
z/OS (Unix System Services)
client.id = rdkafka
client.software.name = librdkafka
metadata.broker.list = 192.0.1.2:9092
message.max.bytes = 600000
message.copy.max.bytes = 65535
receive.message.max.bytes = 100000000
max.in.flight.requests.per.connection = 1000000
metadata.request.timeout.ms = 60000
topic.metadata.refresh.interval.ms = 5000
metadata.max.age.ms = 900000
topic.metadata.refresh.fast.interval.ms = 250
topic.metadata.refresh.fast.cnt = 10
topic.metadata.refresh.sparse = true
topic.metadata.propagation.max.ms = 30000
debug =
socket.timeout.ms = 60000
socket.blocking.max.ms = 1
socket.send.buffer.bytes = 0
socket.receive.buffer.bytes = 0
socket.keepalive.enable = false
socket.nagle.disable = false
socket.max.fails = 1
broker.address.ttl = 1000
broker.address.family = any
enable.sparse.connections = true
reconnect.backoff.jitter.ms = 0
reconnect.backoff.ms = 100
reconnect.backoff.max.ms = 10000
statistics.interval.ms = 10000
enabled_events = 0
stats_cb = 0x1e60a4e0
log_cb = 0x1e643cd0
log_level = 6
log.queue = false
log.thread.name = true
enable.random.seed = true
log.connection.close = true
socket_cb = 0x1e6afa58
open_cb = 0x1e7db490
default_topic_conf = 0x5000102b60
internal.termination.signal = 0
api.version.request = true
api.version.request.timeout.ms = 10000
api.version.fallback.ms = 0
broker.version.fallback = 0.10.0
security.protocol = plaintext
ssl.ca.certificate.stores = Root
enable.ssl.certificate.verification = true
ssl.endpoint.identification.algorithm = none
sasl.mechanisms = GSSAPI
sasl.kerberos.service.name = kafka
sasl.kerberos.principal = kafkaclient
sasl.kerberos.kinit.cmd = kinit -R -t "%{sasl.kerberos.keytab}" -k %{sasl.kerberos.principal} || kinit -t "%{sasl.kerberos.keytab}" -k %{sasl.kerberos.principal}
sasl.kerberos.min.time.before.relogin = 60000
enable.sasl.oauthbearer.unsecure.jwt = false
test.mock.num.brokers = 0
partition.assignment.strategy = range,roundrobin
session.timeout.ms = 10000
heartbeat.interval.ms = 3000
group.protocol.type = consumer
coordinator.query.interval.ms = 600000
max.poll.interval.ms = 300000
enable.auto.commit = true
auto.commit.interval.ms = 5000
enable.auto.offset.store = true
queued.min.messages = 100000
queued.max.messages.kbytes = 65536
fetch.wait.max.ms = 500
fetch.message.max.bytes = 1048576
fetch.max.bytes = 52428800
fetch.min.bytes = 1
fetch.error.backoff.ms = 500
offset.store.method = broker
isolation.level = read_committed
enable.partition.eof = false
check.crcs = false
allow.auto.create.topics = false
client.rack =
transaction.timeout.ms = 60000
enable.idempotence = false
enable.gapless.guarantee = false
queue.buffering.max.messages = 100000
queue.buffering.max.kbytes = 1048576
queue.buffering.max.ms = 1000
message.send.max.retries = 2147483647
retry.backoff.ms = 100
queue.buffering.backpressure.threshold = 1
compression.codec = none
batch.num.messages = 1000
batch.size = 1000000
delivery.report.only.error = true
sticky.partitioning.linger.ms = 10
Broker configuration: 3 brokers, 1 topic/ 3 partitions, RF:3
The text was updated successfully, but these errors were encountered: