Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zellij is killed by OOM #803

Closed
tailhook opened this issue Oct 25, 2021 · 11 comments
Closed

Zellij is killed by OOM #803

tailhook opened this issue Oct 25, 2021 · 11 comments

Comments

@tailhook
Copy link

Basic information

zellij --version: 0.18.1
tput lines: 42
tput cols: 159
uname -av or ver(Windows): Linux nixos 5.14.10 #1-NixOS SMP Thu Oct 7 05:53:20 UTC 2021 x86_64 GNU/Linux
alacritty --version: alacritty 0.9.0

Further information
Zellij was killed by OOM while dropping all running processes. I'm not sure:

  1. Which process of zellij was dropped first (server?)
  2. Why it was killed? Total memory usage on the system wasn't high. is it because of high VM usage? (but looks like that 78Gb is quite usual for zellij having few tabs open).

Prior to OOM killing there was high CPU usage spike. I'm not sure if that was by zellij or kernel trying to compress memory (is usually the case before OOM), or other running process inside zellij.

Here is the dmesg log:

[643020.520328] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=user.slice,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/session-2.scope,task=zellij,pid=2377,uid=1000
[643020.520390] Out of memory: Killed process 2377 (zellij) total-vm:78086060kB, anon-rss:555168kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:1556kB oom_score_adj:0
[643020.551282] oom_reaper: reaped process 2377 (zellij), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

Here is the panic I in the terminal running alacritty:

Originating Thread(s):
1. screen_thread: Render
2. ipc_server: Render

Error: thread 'router' panicked at 'called `Result::unwrap()` on an `Err` value: Io(Error { kind: UnexpectedEof, message: "failed to fill whole buffer" })': /build/source/zellij-utils/src/ipc.rs:167
   0: zellij_utils::errors::handle_panic
   1: std::panicking::rust_panic_with_hook
   2: std::panicking::begin_panic_handler::{{closure}}
   3: std::sys_common::backtrace::__rust_end_short_backtrace
   4: rust_begin_unwind
   5: core::panicking::panic_fmt
   6: core::result::unwrap_failed
   7: <zellij_client::os_input_output::ClientOsInputOutput as zellij_client::os_input_output::ClientOsApi>::recv_from_server
   8: std::sys_common::backtrace::__rust_begin_short_backtrace
   9: core::ops::function::FnOnce::call_once{{vtable.shim}}
  10: std::sys::unix::thread::Thread::new::thread_start
  11: start_thread
  12: __clone

Also I've just noted, that just creating few tabs (running zsh if that matters) gets processes with each new one grows both virtual and RSS memory:


❯ procs zellij -i VmRss -i VmSize
 PID:▲  User │ TTY   CPU MEM CPU Time    VmRSS   VmSize │ Command
             │       [%] [%]           [bytes]  [bytes] │
 650693 pc   │ pts/0 0.0 0.1 00:00:00   8.785M 279.758M │ zellij
 650696 pc   │       0.0 0.8 00:00:07 120.613M  73.955G │ /nix/store/rcydc25jrcwlm5dg1mf48f881z1vhxva-zellij-0.18.1/bin/zellij --server /run/us
 650706 pc   │ pts/2 0.0 0.0 00:00:00   1.262M 345.555M │ /nix/store/rcydc25jrcwlm5dg1mf48f881z1vhxva-zellij-0.18.1/bin/zellij --server /run/us
 651355 pc   │ pts/3 0.0 0.2 00:00:00  31.113M  13.564G │ /nix/store/rcydc25jrcwlm5dg1mf48f881z1vhxva-zellij-0.18.1/bin/zellij --server /run/us
 651424 pc   │ pts/4 7.2 0.3 00:00:00  48.949M  25.629G │ /nix/store/rcydc25jrcwlm5dg1mf48f881z1vhxva-zellij-0.18.1/bin/zellij --server /run/us
 651491 pc   │ pts/6 0.0 0.4 00:00:00  65.711M  37.695G │ /nix/store/rcydc25jrcwlm5dg1mf48f881z1vhxva-zellij-0.18.1/bin/zellij --server /run/us
 651547 pc   │ pts/7 7.2 0.5 00:00:00  82.047M  49.761G │ /nix/store/rcydc25jrcwlm5dg1mf48f881z1vhxva-zellij-0.18.1/bin/zellij --server /run/us
 651613 pc   │ pts/9 0.0 0.6 00:00:00  99.004M  61.889G │ /nix/store/rcydc25jrcwlm5dg1mf48f881z1vhxva-zellij-0.18.1/bin/zellij --server /run/us

This pattern looks very similar to how descriptors are leaking #796. And this even be related issue.

@a-kenji
Copy link
Contributor

a-kenji commented Oct 25, 2021

Hey!
Thank you for taking the time in creating these awesome issue reports!

Also I've just noted, that just creating few tabs (running zsh if that matters) gets processes with each new one grows both virtual and RSS memory:
This pattern looks very similar to how descriptors are leaking #796. And this even be related issue.

This could be, this could also be a compounding issue, eg. the descriptors leaking, and the plugin duplication.

Currently we don't have a primitive for sharing plugins across different views, so if you are using plugins (for example in the default layout), they are still copied and that could also explain some part of the ram usage of many tabs.

@tailhook tailhook changed the title Zelling is killed by OOM Zellij is killed by OOM Oct 25, 2021
@tailhook
Copy link
Author

Yes. I use default layout.

So do that plugins run in child processes or in the main process? (and inheriting plugins memory just because of fork?)

@tailhook
Copy link
Author

Also can I use zellij without plugins at all? (i.e. I think tab bar and status bars are plugins too, right?)

@a-kenji
Copy link
Contributor

a-kenji commented Oct 25, 2021

So do that plugins run in child processes or in the main process? (and inheriting plugins memory just because of fork?)

I am sadly not that knowledgeable about the plugin system myself.
The way I am pretty sure it works is that the plugins are executed by the wasm_vm and have separate memory, but they pass events in between the main program and the vm. So they should be run in a child process.
cc @TheLostLambda

Also can I use zellij without plugins at all? (i.e. I think tab bar and status bars are plugins too, right?)

Yes, you can. Though I think it is not that comfortable to do it completely
without plugins yet.

Basically the layout-template is describing the plugins in the default context.
You can see the default layout with zellij setup --dump-layout default.
I personally run zellij without the status-bar, but with the tab-bar on the bottom.
If you put your layout configuration in ~/.config/zellij/layouts/default.yaml then zellij will choose your default layout at startup.
You can also check the directory with zellij setup --check.

We are somewhat close in making sharing plugins with multiple tabs possible,
architecture wise. But I am not sure how far we are in general, because several other points have priority for many people right now.

@tailhook
Copy link
Author

So do that plugins run in child processes or in the main process? (and inheriting plugins memory just because of fork?)

I am sadly not that knowledgeable about the plugin system myself. The way I am pretty sure it works is that the plugins are executed by the wasm_vm and have separate memory, but they pass events in between the main program and the vm. So they should be run in a child process. cc @TheLostLambda

So looking at the code here:

It looks like most work is going on in the main server process, and this forked one is just waiting for user-specified process to exit. To make it use less memory (because copy-on-write on the parent), it should either: quickly do libc::execve into something like zellij --process-handler <process to run>. Or just do the same libc::execve directly into user process and do handling of that child in the parent process.

In fact, as far as I can see, zellij already does most of the thing in the parent server process. So it might be better to use openpty instead of forkpty and then use normal std::command::Command to spawn a child and pass stdio.

This should save tons of memory together with plugin sharing.

@imsnif
Copy link
Member

imsnif commented Oct 25, 2021

The openpty suggestion sounds promising. As I understand it, if we do that we won't need to muck about with libc::execve and such, right?

@tailhook
Copy link
Author

The openpty suggestion sounds promising. As I understand it, if we do that we won't need to muck about with libc::execve and such, right?

Yes, I think so.

@imsnif
Copy link
Member

imsnif commented Oct 25, 2021

Cool, I'll try to whip up something in the near future.

@a-kenji
Copy link
Contributor

a-kenji commented Oct 25, 2021

Thank you @tailhook this is awesome input!

@imsnif
Copy link
Member

imsnif commented Nov 1, 2021

Hey @tailhook, I merged this fix into main right now. If you'd like to take a look at the changes, this is the PR: #830

This also addresses #796.

Thanks for the tips! This was quite a bender but I learned a ton.

@tailhook
Copy link
Author

tailhook commented Nov 1, 2021

Great! I'm not familiar with codebase for doing a good review, but generally looks okay.

@tailhook tailhook closed this as completed Nov 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants