Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consumer performance benchmarking #1853

Open
7 tasks
poonkothaip opened this issue Nov 18, 2024 · 6 comments
Open
7 tasks

Consumer performance benchmarking #1853

poonkothaip opened this issue Nov 18, 2024 · 6 comments

Comments

@poonkothaip
Copy link

Description

We are using this library based consumer (basic one) to consume messages from MSK. performance wise, this doesnt meet our defined requirements. Can anyone suggest any pointers on how to optimise the parameters or tune to get best performance. to process 5k messages, as of now it takes around 24-45 minutes but our requirement is less than 5 min. Properties used are as below

image

How to reproduce

NA

Checklist

Please provide the following information:

  • confluent-kafka-python and librdkafka version (confluent_kafka.version() and confluent_kafka.libversion()): 2.6.0 library and MSK
  • Apache Kafka broker version: 0.1.0
  • Client configuration: {...}-ECS Fargate service
  • Operating system: Linux
  • Provide client logs (with 'debug': '..' as necessary) NA
  • Provide broker log excerpts NA
  • Critical issue Performance
@pranavrth
Copy link
Member

The performance is pretty bad. Can you check which part of the code is taking much time? Are you performing some IO operation between consuming messages?

Are you sure that you are using Apache Kafka 0.1.0 version? 0.7.0 was release in 2012.

@poonkothaip
Copy link
Author

Hi Pranav
Thanks for your note
Apache kafka version - 2.7.2
after consuming we do small log operation alone for now. But sometimes it takes upto 1 hr. we have 3 partitions.

Thanks
Poongkothai

@pranavrth
Copy link
Member

Can you log time for each step? Getting message from the python client should not be taking alot of time.

@poonkothaip
Copy link
Author

Sure Pranav. Will do and revert but might take sometime
Do we have any benchmarking of this with any other python libraries for kafka. am using the latest confluent-kafka python version anyways

Thanks

@poonkothaip
Copy link
Author

Hi
I triggered load of 500 messages and have basic consumer code to print the message in both library based version of code
Logs show for kafka-python, it took less than a minute to process all 500 (started and done at 12.55). Confluent-kafka took 9 mins (started at 12.43 and ended at 12.52)
Unfortunately am not able to attach logs here. But its basic code with just log message inside doing nothing. I suspect consumer poll is taking more time in confluent kafka than kafka python.

Can you share if any bench marking was done earlier

Thanks

@poonkothaip
Copy link
Author

Hi @pranavrth - By default do we need to have any threading concept added to improve performance . I run as single threaded application in both cases (kafka-python and confluent -kafka) and no of partitions are more than 1. Ideally threads may be equal to no of partitions in case sequence needs to be maintained. Any suggestions to further improve the performance for confluent kafka based consumer. Also how to overcome consumer poll taking more time in confluent kafka.
Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants