-
-
Notifications
You must be signed in to change notification settings - Fork 678
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High memory usage and latency when a program produces output too quickly #525
Comments
Currently there is a workaround in However, as there is no right amount of sleep that works for every machine, the proper solution would be to have back pressure between |
I made a proof-of-concept using a bounded
TL;DR: So simply changing |
I found a workable way to add back pressure between The idea is to have a dedicated bounded channel between This required a change from @imsnif If this is an acceptable approach (poc patch here) I'll whip up a PR-able implementation. |
@kxt - at a brief glance this looks like a good direction (though I haven't run it or looked too deeply at it yet). I'm good with moving forward with a PR so we can look deeper. I might want to bring in @kunalmohan to also take a look (if he has the time) since he's done a lot of work in this area. |
@kxt great work in the #523! The throughput has increased by a LOT! |
I've created two PRs. The first one contains some refactors preparing for the fix, including switching from The second draft PR contains the actual fix, seems to work OK. However, some of the testcases fail unless I've noticed that many testcases already contain Can someone help me figuring out what's going on with the testcases? @kunalmohan ? |
Hey @kxt - I added a review in the first PR. About the tests in the draft PR (which I think you didn't yet create? unless I missed it?): our testing infrastructure is one of the most broken parts of the code-base. It was one of the first pieces of Zellij and is very long overdue for an overhaul. Since it works by comparing snapshots of the app state (essentially what's on screen in certain situations) and the app includes some asynchronous elements, instead of creating a proper synchronization we just kept adding sleeps to make it work properly. I am not proud of this, but I am honest about it. :) I plan on devoting attention to it very soon after all the other urgent stuff. So I'm afraid the best answer I can give you now is to add sleeps where necessary to wait for data to come from the fake pty before sending the next command. (also, there might still be some atomicity issues left where stdin is concerned, though I think we got most of those already). If you want further specific help with this, let me know. |
Sorry, I did not link the draft PR properly properly, it's in my fork of zellij right now. I figured that it's because of the lots of cross-thread communication makes it hard to figure out if something has already happened or not. If you say Do you have any concrete plans on the test overhaul? |
Sounds good.
An idea we've been throwing around the team is to move whatever makes sense in the integration tests to be unit tests, and make the rest be proper end-to-end tests. Our idea is to spin up a docker container and communicate with it through ssh, thus simulating the full app cycle. This still doesn't solve the synchronization problem though, but I have a feeling that's the sort of issue you solve as you work on it. If you have any other ideas, they would of course be most welcome. |
Pty reads a command's output and feeds it to Screen using an unbounded queue. However, if the command produces output faster than what `Screen` can render, `Pty` still pushes it on the queue, causing it to grow indefinitely, resulting in high memory usage and latency. This patch fixes this by using a bounded queue between Pty and Screen, so if Screen can't keep up with Pty, the queue will fill up, exerting back pressure on Pty, making it read the command's output only as fast as Screen renders it. The unbounded queue is kept between Screen and producers other than Pty to avoid a deadlock such as this scenario: * pty thread filling up screen queue as soon as screen thread pops something from it * wasm thread is processing a Render instruction, blocking on the screen queue * screen thread is trying to render a plugin pane. It attempts to send a Render insturction to the blocked wasm thread running the plugin. This patch also adds a generous amount of sleeps to the integration tests as having backpressure changes the timing of how instructions are processed. Fixes zellij-org#525.
Pty reads a command's output and feeds it to Screen using an unbounded queue. However, if the command produces output faster than what `Screen` can render, `Pty` still pushes it on the queue, causing it to grow indefinitely, resulting in high memory usage and latency. This patch fixes this by using a bounded queue between Pty and Screen, so if Screen can't keep up with Pty, the queue will fill up, exerting back pressure on Pty, making it read the command's output only as fast as Screen renders it. The unbounded queue is kept between Screen and producers other than Pty to avoid a deadlock such as this scenario: * pty thread filling up screen queue as soon as screen thread pops something from it * wasm thread is processing a Render instruction, blocking on the screen queue * screen thread is trying to render a plugin pane. It attempts to send a Render insturction to the blocked wasm thread running the plugin. This patch also adds a generous amount of sleeps to the integration tests as having backpressure changes the timing of how instructions are processed. Fixes zellij-org#525.
* refactor(fakes): clean up add_terminal_input * refactor(fakes): append whole buf to output_buffer in FakeStdoutWriter::write * refactor(fakes): append whole buf to output_buffer in FakeInputOutput::write_to_tty_stdin * fix(fakes): allow partial reads in read_from_tty_stdout This patch fixes two bugs in read_from_tty_stdout: * if there was a partial read (ie. `bytes.read_position` is not 0 but less than `bytes.content.len()`), subsequent calls to would fill `buf` starting at index `bytes.read_position` instead of 0, leaving range 0..`bytes.read_position` untouched. * if `buf` was smaller than `bytes.content.len()`, a panic would occur. * refactor(channels): use crossbeam instead of mpsc This patch replaces mpsc with crossbeam channels because crossbeam supports selecting on multiple channels which will be necessary in a subsequent patch. * refactor(threadbus): allow multiple receivers in Bus This patch changes Bus to use multiple receivers. Method `recv` returns data from all of them. This will be used in a subsequent patch for receiving from bounded and unbounded queues at the same time. * refactor(channels): remove SenderType enum This enum has only one variant, so the entire enum can be replaced with the innards of said variant. * refactor(channels): remove Send+Sync trait implementations The implementation of these traits is not necessary, as SenderWithContext is automatically Send and Sync for every T and ErrorContext that's Send and Sync.
Pty reads a command's output and feeds it to Screen using an unbounded queue. However, if the command produces output faster than what `Screen` can render, `Pty` still pushes it on the queue, causing it to grow indefinitely, resulting in high memory usage and latency. This patch fixes this by using a bounded queue between Pty and Screen, so if Screen can't keep up with Pty, the queue will fill up, exerting back pressure on Pty, making it read the command's output only as fast as Screen renders it. The unbounded queue is kept between Screen and producers other than Pty to avoid a deadlock such as this scenario: * pty thread filling up screen queue as soon as screen thread pops something from it * wasm thread is processing a Render instruction, blocking on the screen queue * screen thread is trying to render a plugin pane. It attempts to send a Render insturction to the blocked wasm thread running the plugin. This patch also adds a generous amount of sleeps to the integration tests as having backpressure changes the timing of how instructions are processed. Fixes zellij-org#525.
version
0.12.0
Reproduction
Run
time cat 1_000_000_lines.txt
and compare it with the time measured by wall clock. The amount of timetime
reports is significantly less than what is observed. Having a slow computer and using non-optimized (non-release
) build helps.Root cause
A program's output is being processed by the
Pty
thread. This thread reads the output and propagates it to theScreen
thread for rendering through an unbounded channel calledto_screen
. If the program produces output faster than whatScreen
can render,to_screen
's queue starts to grow indefinitely. This results in growing memory usage and input latency.An alternative explanation (kudos to
@aeinstein
): running nearly at the speed of light,cat
is experiencing time dilation compared to a stationary observer.As PR #523 increases the throughput of
Pty
thread, it exacerbates this issue.The text was updated successfully, but these errors were encountered: