Consumer performance benchmarking #1853

poonkothaip · 2024-11-18T09:58:17Z

Description

We are using this library based consumer (basic one) to consume messages from MSK. performance wise, this doesnt meet our defined requirements. Can anyone suggest any pointers on how to optimise the parameters or tune to get best performance. to process 5k messages, as of now it takes around 24-45 minutes but our requirement is less than 5 min. Properties used are as below

How to reproduce

NA

Checklist

Please provide the following information:

confluent-kafka-python and librdkafka version (confluent_kafka.version() and confluent_kafka.libversion()): 2.6.0 library and MSK
Apache Kafka broker version: 0.1.0
Client configuration: {...}-ECS Fargate service
Operating system: Linux
Provide client logs (with 'debug': '..' as necessary) NA
Provide broker log excerpts NA
Critical issue Performance

The text was updated successfully, but these errors were encountered:

pranavrth · 2024-11-18T12:20:09Z

The performance is pretty bad. Can you check which part of the code is taking much time? Are you performing some IO operation between consuming messages?

Are you sure that you are using Apache Kafka 0.1.0 version? 0.7.0 was release in 2012.

poonkothaip · 2024-11-19T07:00:33Z

Hi Pranav
Thanks for your note
Apache kafka version - 2.7.2
after consuming we do small log operation alone for now. But sometimes it takes upto 1 hr. we have 3 partitions.

Thanks
Poongkothai

pranavrth · 2024-11-20T03:54:40Z

Can you log time for each step? Getting message from the python client should not be taking alot of time.

poonkothaip · 2024-11-20T04:23:46Z

Sure Pranav. Will do and revert but might take sometime
Do we have any benchmarking of this with any other python libraries for kafka. am using the latest confluent-kafka python version anyways

Thanks

poonkothaip · 2024-11-29T08:41:59Z

Hi
I triggered load of 500 messages and have basic consumer code to print the message in both library based version of code
Logs show for kafka-python, it took less than a minute to process all 500 (started and done at 12.55). Confluent-kafka took 9 mins (started at 12.43 and ended at 12.52)
Unfortunately am not able to attach logs here. But its basic code with just log message inside doing nothing. I suspect consumer poll is taking more time in confluent kafka than kafka python.

Can you share if any bench marking was done earlier

Thanks

poonkothaip · 2025-01-13T03:29:05Z

Hi @pranavrth - By default do we need to have any threading concept added to improve performance . I run as single threaded application in both cases (kafka-python and confluent -kafka) and no of partitions are more than 1. Ideally threads may be equal to no of partitions in case sequence needs to be maintained. Any suggestions to further improve the performance for confluent kafka based consumer. Also how to overcome consumer poll taking more time in confluent kafka.
Thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consumer performance benchmarking #1853

Consumer performance benchmarking #1853

poonkothaip commented Nov 18, 2024

pranavrth commented Nov 18, 2024

poonkothaip commented Nov 19, 2024

pranavrth commented Nov 20, 2024

poonkothaip commented Nov 20, 2024

poonkothaip commented Nov 29, 2024

poonkothaip commented Jan 13, 2025

Consumer performance benchmarking #1853

Consumer performance benchmarking #1853

Comments

poonkothaip commented Nov 18, 2024

Description

How to reproduce

Checklist

pranavrth commented Nov 18, 2024

poonkothaip commented Nov 19, 2024

pranavrth commented Nov 20, 2024

poonkothaip commented Nov 20, 2024

poonkothaip commented Nov 29, 2024

poonkothaip commented Jan 13, 2025