-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
net: "queue: Use k_poll if enabled" commit regressed BSD Sockets performance #2422
Comments
by Paul Sokolovsky: Luiz Augusto von Dentz : FYI I'm working on a native Zephyr testcase. |
by Luiz Augusto von Dentz: So what exactly is this some sort of benchmark? Anyway k_poll version should be getting similar numbers so perhaps something else is causing the problem, 1 second is awful long for k_fifo_get. |
by Paul Sokolovsky: Luiz Augusto von Dentz :
It uses ab, Apache Bench, which is indeed a well-known and accessible HTTP benchmark tool. But I use not as a benchmark tool, but as an HTTP request generator for a kind of smoke load testing. Like, if it can sustain 1000 requests, it can move onto the next stage of testing with 10000 requests, and if it can sustain that, then perhaps it works (based on the anecdote of the production testing of Android phones, which if it can sustain 24hrs of testing with "monkey" tool gets "ship it!" command). So, as promised, I prepared native Zephyr example: zephyrproject-rtos/zephyr#1378 The way to test it:
|
by Paul Sokolovsky:
Perhaps, something else is causing it, like race condition or bug in the scheduler. But exactly zephyrproject-rtos/zephyr@84db641 is what triggered this regression, so I'd appreciate help with investigating it (I for one have no idea how to approach it). |
by Paul Sokolovsky: More info: with the following patch:
Slowdowns happen much less frequently. Like, none for 100 requests, and for 1000 it's:
|
by Paul Sokolovsky: Actually, if only "printk("p out\n");" is active, it's enough to alleviate the slowdown. |
by Paul Sokolovsky: Further update: the above all happens on qemu_x86. I tested on real hardware, frdm_k64f, and can't reproduce:
This is worst of all of course, when you can't trust your devel tool (emulator). However, there're other difference points still: arch and interface speed (slip vs ethernet). Will try on qemu_cortex_m3 next. |
by Paul Sokolovsky:
That almost helped. First run on -n10000 went perfect. Repeating it, tested was aborted with "conn reset by peer", which is surely different issue. Scaling down to -n1000, I still get regular:
In other words, qemu_cortex_m3 unchanged works like qemu_x86 with printk added after k_poll() call (above). |
by Andy Ross: So... the hypothesis is that something in k_poll() (or the k_queue use of k_poll) is failing to wake up synchronously and we're hitting 1000ms timeouts, but only on some systems? Have we verified that the "slow downs" are indeed a timeout condition? The printk interaction seems werid, on qemu that's just a bunch of synchronous register writes, no changes to interrupt delivery order should be happening. Can you replace the printk() with a k_busy_wait() and fix it too? Do we know which queue is involved? Have we ruled out delays on insert from some other source (seems unlikely I guess). |
by Andy Ross: I spent some time staring at the queue and poll implementations and can't see a race. But here's one shot in the dark: Ben's code has some subtle locking at the entry to k_poll() to allow the initialization of the (potentially large) poll array to be done without the IRQ lock held for the whole process. I can't see anything wrong with it, but the fact that the printk() above "fixes" the issue implies the the race might be somewhere in this viscinity: a "wakeup event" happens "very soon, maybe already" relative to the k_poll() call, and simply delaying it a little causes poll to hit the latter code path instead of the former. If you have the rig ready, can you try this experiment and see if it affects anything? It wouldn't be a fix (it simply keeps the lock the whole time and thus is subject to latency problems with big poll arrays), but if it works it helps to localize the problem:
|
by Paul Sokolovsky: Andy Ross : Thanks much for looking into this! Let me start from the end, with trying the patch you give. Somehow, it didn't apply for me as is, so I redid it manually, catching another case of irq_lock to remove, see my exact patch below. So, unfortunately, it didn't resolve issue:
The patch:
|
by Paul Sokolovsky:
Well, it can be even worse than that:
Otherwise yes, the hypothesis is that with slow enough packet speeds (like with SLIP), there's some race or something leading to "extra sleep". |
by Paul Sokolovsky:
Well, that was exactly my question - where does this 1s (1000ms) - even delay comes from? One such figure I know is CONFIG_NET_TCP_ACK_TIMEOUT. I changed it to 500, but nope delays are still integer numbers of 1000ms (1s, 2s, 3s, 4s seen). Grepping thru .config, I don't see other "1000" figures, would need to grep thru codebase for "hardcoded" values. |
by Paul Sokolovsky:
Even before, I tried k_yield() there, as seen (with some worry from myself) in some Jukka Rissanen 's networking patches. Didn't help visibly. With k_busy_wait(500) after k_poll() and 1000 requests it's:
With k_busy_wait(1000) after k_poll() and 1000 requests it's:
With k_busy_wait(2000) after k_poll() and 1000 requests it's:
So, k_busy_wait() definitely "helps", but doesn't resolve the issue, and the behavior isn't even monotonic function of the delay used (of course, there's enough variability/nondeterminism in the first place). |
by Paul Sokolovsky:
Exact one - no, but (at least) following queues are involved: RX packet queueAccept socket queue (new connection contexts go there)Per-socket RX packet queueI modified to print k_uptime_get_32() at some points in socket code, and here what I see: 38690 zsock_accepted_cb 0x004053b8 38740 zsock_accepted_cb 0x004053b8 39810 zsock_accepted_cb 0x004053b8 40860 zsock_accepted_cb 0x004053b8 So, first 2 connections got served without a delay, then 1s between following 2. But once zsock_accepted_cb() is called (which does k_fifo_put), there're no big delay until this connection is served, the delay happens inbetween connection acceptance and servings. Will patch in more printk...
What do you mean/have in mind? The processing should be like above: a driver puts a packet in RX queue, where it's taken from by RX thread and routed to either accept on recv callback, both of which put into other queues (which are actually union'ed, as their lifetimes don't cross), from which app code pulls them via calls to accept() or recv(). |
by Paul Sokolovsky: Luiz Augusto von Dentz on IRC suggested:
|
Reported by Paul Sokolovsky:
With a sample "http_server" written in MicroPython, with BSD Sockets as the underlying implementation, I get:
With "git checkout 84db641de66463550fb41f083dbdf0b02fc78284"
With "git checkout 84db641de66463550fb41f083dbdf0b02fc78284^" (note ^ at the end)
Sockets implementation heavily relies on FIFOs (and uses k_poll on FIFO too). Per above, there's clear case of "some FIFO operation took +1s with k_fifo_get based on k_poll vs previous wait_q based impl".
zephyrproject-rtos/zephyr@84db641
(Imported from Jira ZEP-2593)
The text was updated successfully, but these errors were encountered: