Performance benefits? #419

jorenham · 2022-05-09T20:25:07Z

So I've been testing the performance a bit, varying e.g.

1:1, 16:1, 64:1 producer : consumer ratio
normal/lifo/priority queue etc
asyncio/uvloop event loop
async -> async, async -> sync, sync -> sync
(probably more)

I found that, janus queues are ~5x slower in sync->sync, ~9x slower in sync->async, and ~15x slower in async->async. This is pretty much consistent across all parameter sets.

This confirmed my suspicion that the performance gain of parallel computation is often less than the cost of using e.g. threading.Lock a lot (the GIL certainly doesn't help either).

Right now, I can imagine that many users have incorrect expectations of janus. To avoid this, you could add an example that shows how janus can outperform single-threaded asyncio, by employing multiple threads. Additionally, a caveat about janus' performance would be helpful.

The text was updated successfully, but these errors were encountered:

willstott101 · 2022-06-21T08:35:59Z

Interesting, can you clarify what you're comparing to in each of these cases?

A cursory glance over Janus' code which I have't really looked at before shows that there'd be some pretty easy performance gains by splitting Janus up into different queue types depending on the sender/receiver thread types. That'd be harder to both use and maintain so might be a hard sell.

jettify · 2023-03-18T19:59:11Z

This library is about sync/async and async/sync communication, there is price to pay for extra synchronization. Use appropriate queue for your case.

(Also without benchmarks it is hard to discuss this issue in the first place, not clear what are you measuring.)

jorenham · 2023-03-18T21:56:01Z

I fully agree that it's very difficult to correctly bridge async <-> sync code, and that Janus attempts to solve this in the context of producers - consumers, with a queue as medium.

The producer/consumer pattern is often used for performance reasons, e.g. map-reduce, wsgi, websockets, etc.

Janus is presented as a generic solution for bridging async/sync code using the producer/consumer pattern, and provides no specific use-cases, or one of those "what is this project / what this project is not" sections.

This makes is easy to think it's a good idea to use Janus in your a performance-sensitive project.

But my quick-and-dirty benchmarks showed that, in a typical producer / consumer context, the Janus queue is significantly slower than conventional queues, likely overshadowing the performance that can be gained from employing the producer-consumer pattern altogether.

So, considering the amount of users, I believe it is very important that describe in the readme:

the precise problem that Janus aims to solve,
an realistic situation where using Janus is better (cleaner code, no significant performance impact) than the non-Janus alternative,
when it is not a good idea, (e.g. high-throughput performance bottleneck), and
transparant benchmarks.

jettify · 2023-03-19T02:26:39Z

Library was created exactly for reason stated in read:

Mixed sync-async queue, supposed to be used for communicating between classic synchronous (threaded) code and asynchronous (in terms of asyncio) one.

I do not think we ever calmed any performance gains, or anything. I agree that docs could be better and we are happy to accept any contributions there.

x42005e1f · 2024-11-08T18:06:08Z

I created Culsans, which should be more suitable for performance-sensitive applications. I would be glad if you, @jorenham, could test the performance of my library on the same tests and tell me if it is acceptable to you. My queues also behave as fair: if you replace Queue() in the example with culsans.Queue().sync_q, the numbers in the output will alternate, which eliminates resource starvation.

However, I, and others, would be interested to know exactly what you are measuring, because if you are simply comparing async-aware queues with synchronous queues that block the event loop, such tests are meaningless and irrelevant to this project. But if you are comparing with naive implementations that use event loop methods, and that are very popularized on StackOverflow (which is bad because they have pitfalls), then such a comparison makes sense.

jorenham · 2024-11-08T19:18:44Z

I believe that I was simply trying to figure out if janus could be a better alternative to the queue I was using at that moment, as I was dealing with hundreds of high-throughput websocket streams that needed to be synced and inserted in a database.

It was quite a while back, and I wasn't able to find the benchmark code I used back then. But if I remember correctly, the tests were very simple, and used a simple pub/sub pattern:

The "producer" function filled up the queue as fast as it could
The "consumer" function removed items from the queue as fast as it could

So as I explained in the issue here, I tested this with different producer-to-consumer ratios, and reported the differences between the different queue implementations.

I'm not involved in that project anymore for now, so hopefully you'll be able to replicate my results with this.

x42005e1f · 2024-11-08T19:34:15Z

Well, thank you for the information. From the description, it sounds like you were testing mostly non-blocking methods (whether explicitly or implicitly), since there is no active interaction between consumers and producers in this scenario. Janus is really bad in these tests, but Culsans is not, so I can assume I have solved this issue.

x42005e1f · 2024-11-08T20:28:19Z

Since such tests actually test the speed of put_nowait() and get_nowait() methods, let's represent the pure test as a regular loop with put-get in one thread. Below is the corresponding benchmark for the async -> async case.

janus & culsans benchmark

import asyncio
import time

import janus  # import culsans as janus


async def main():
    queue = janus.Queue()

    put = queue.async_q.put_nowait
    get = queue.async_q.get_nowait

    start = time.monotonic()

    for _ in range(100000):
        put(42)
        get()

    print(time.monotonic() - start)

    queue.close()


asyncio.run(main())

I took CPython 3.10 to match the year of this issue and ran this benchmark on it. With Janus this test for me runs in almost 15 seconds + slows down the closing of the event loop. With Culsans it runs in 0.15 seconds and the event loop closes without delay. These numbers are real, I only rounded them up to the second non-zero digit. And on PyPy, the difference is 10 times bigger (Culsans is faster than Janus by almost a thousand times).

Update: since 4a57895, no-wait tests do not call notification methods, so in those tests, Janus performance became comparable to Culsans performance. But Janus can still perform badly on blocking calls, so this issue is only half solved.

kristoftorok · 2025-01-09T14:58:33Z

Hello, I'm a bit confused about whether to stay with Janos or switch to Culsans.

I'm building an application that logs network traffic. The log parsing part is single threaded synchronous code and the writing to the database is async.

I want to optimise the application to be able to process thousands of log entries/second.

So I'm not really sure if a queuing system in Python is even a good idea or if I should switch to an MQ like RabbitMQ. (Testing will show)

But before I change the whole queuing system in my application, I want to make sure that I really need it.

So this is how the app works:

The sync thread writes data to the queue as fast as it can:

('10.144.3.10', '10.8.1.66', 53, 17, 'UDP', None, '2025-01-09T14:36:48.650996')

The async thread(s) read the data from the queue and write it to the DB as fast as it can.

So for this type of use case, does Clusans help with the performance?

x42005e1f · 2025-01-09T16:16:53Z

Hello, thank you for your question.

Yes, Culsans can indeed improve performance in your case. Since version 1.2.0 the performance of Janus is much improved, but it still creates new tasks to notify threads. Culsans does not create new tasks and inherits aiologic semantics, according to which the shortest path to wake up a thread/task is selected. But the speedup is likely to be small unless you are running PyPy on old hardware (as you can see in the Culsans results at the end of its README, it currently gives only 2x speedup in a single thread test).

I also note that Janus supports only one asynchronous thread (event loop). With Culsans you can use multiple threads, but does that make sense outside of a free-threaded mode?

If you will not use the extra features of Culsans, it will be fully compatible with Janus - you can switch between them just by swapping imports. Culsans depends on aiologic, which is not currently covered in tests, so you may prefer to stay on Janus as a more reliable option.

And regarding the problem you described, yes, queues are handy but seem to be optional. You may consider using asyncio.run_coroutine_threadsafe() to avoid unnecessary overhead. Also, kloop was published 3 years ago to minimize system calls (and hence expensive context switching, the impact of which is clearly visible in my benchmark to aiologic), but it has unfortunately not been updated since then. Ideally your problem could be solved on something like Elixir.

kristoftorok · 2025-01-09T20:15:53Z

Thanks for your answer!

Well, Elixir isn't really an option for me because I've never used it and learning a new language just for this project seems kind of unnecessary.

The main reason why I want to stick with the queue system is that the log messages that are pushed into the queue are not always at the same rate, which means I need to somehow handle burst entries without putting too much load on the database.

With the queue system, I can easily scale down the database writes if there are too many log entries in the queue.

For example, if there is a big spike in the logs and 5000 log entries are pushed into the queue, I can limit the database writes to read only 500 entries from the queue per second. So the queue just acts as a buffer for spikes. Of course this introduces some latency into the writes, but for my use case it's not a big deal.

Anyway, thank you for your help!
I will keep experimenting with different solutions until I find the right one. :D

jettify closed this as completed Mar 18, 2023

x42005e1f mentioned this issue Oct 26, 2024

Status of project and performance improvements #679

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance benefits? #419

Performance benefits? #419

jorenham commented May 9, 2022

willstott101 commented Jun 21, 2022

jettify commented Mar 18, 2023

jorenham commented Mar 18, 2023

jettify commented Mar 19, 2023

x42005e1f commented Nov 8, 2024

jorenham commented Nov 8, 2024

x42005e1f commented Nov 8, 2024

x42005e1f commented Nov 8, 2024 •

edited

Loading

kristoftorok commented Jan 9, 2025

x42005e1f commented Jan 9, 2025

kristoftorok commented Jan 9, 2025

Performance benefits? #419

Performance benefits? #419

Comments

jorenham commented May 9, 2022

willstott101 commented Jun 21, 2022

jettify commented Mar 18, 2023

jorenham commented Mar 18, 2023

jettify commented Mar 19, 2023

x42005e1f commented Nov 8, 2024

jorenham commented Nov 8, 2024

x42005e1f commented Nov 8, 2024

x42005e1f commented Nov 8, 2024 • edited Loading

kristoftorok commented Jan 9, 2025

x42005e1f commented Jan 9, 2025

kristoftorok commented Jan 9, 2025

x42005e1f commented Nov 8, 2024 •

edited

Loading