Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems with repeatedly attaching to session - leaks memory and finally crashes #1948

Closed
Tracked by #1568
raphCode opened this issue Nov 16, 2022 · 11 comments · Fixed by #2328
Closed
Tracked by #1568

Problems with repeatedly attaching to session - leaks memory and finally crashes #1948

raphCode opened this issue Nov 16, 2022 · 11 comments · Fixed by #2328

Comments

@raphCode
Copy link
Contributor

Just some stress testing with rapid attaching and detaching yielded this possible server crash locations:

Error occurred in server:

  × Thread 'server_listener' panicked.
  ├─▶ At .../zellij-utils/src/ipc.rs:213:37
  ╰─▶ called `Result::unwrap()` on an `Err` value: EMFILE

My build is at dc92290

@har7an
Copy link
Contributor

har7an commented Nov 21, 2022

More information: It seems EMFILE is unix errno 24, which means "Too many open files". Can you try reproducing this in a fresh user session? Maybe we have a problem with not closing file descriptors correctly?

@raphCode
Copy link
Contributor Author

raphCode commented Nov 21, 2022

Repro:

  • in terminal 1: open zellij session
  • in terminal 2: while true; do timeout 0.1 zellij attach; done

This also leaks memory :)

Here is a version that counts how many attaches were possible:
n=0; while true; do timeout --preserve-status 0.1 zellij attach || break; ((n++)); done; echo $
I get exactly 252 attaches before it crashes.

My build is now 5ad0429

@raphCode raphCode changed the title Possible server crash in ipc.rs Problems with repeatedly attaching to session - leaks memory and finally crashes Nov 21, 2022
@matu3ba
Copy link
Contributor

matu3ba commented Jan 11, 2023

My gut feeling tells me that there might be a fork without CLOEXEC used (either no OCLOEXEC set or no execv being used leading to file descriptor leaking into the child process), but this does not explain the memory leaks.

Background: Rust libstd defaults to a mutex on process spawn, so other threads in the process should not leak it. See also rust-lang/rust#38227.

@raphCode
Copy link
Contributor Author

raphCode commented Feb 1, 2023

This bug is also triggered when using something like zellij run or zellij ls since that behaves like a temporary client or at least connects to the server processes.
Even worse, this seems to "increment the attach counter" for all sessions visible so they will eventually crash.
Demo:
while true; do zellij ls; done
while true; do zellij --session aaaa run --close-on-exit -- echo; done
Any of these will reliably crash all your sessions :)

@mfasold
Copy link

mfasold commented Feb 9, 2023

I ran into this issue in practice. We had a number of users running zellij on a single machine until all instances crashed at some point of time. The logs showed the same EMFILE error.

Later I checked the number of open files, and found that zellij was using about 50k open file descriptors! The majority (85%) of them were Unix socket files for the server-client communication. About half of them were used by the "async-std" thread.

Any ideas on how to get to the root of this issue? Would it be of any merit to build a server-client toy example that uses a LocalSocketListener in a separate thread?

gbrigandi added a commit to gbrigandi/zellij that referenced this issue Mar 23, 2023
@gbrigandi
Copy link
Contributor

Just submitted a PR that fixes this issue, see #2322 .

@raphCode
Copy link
Contributor Author

raphCode commented Mar 24, 2023

Nice! It fixes the crashes.

I fear we might still have a small memory leak tho:

  • zellij, default layout with 1 tab, 1 pane, statusbar and tabbar
  • directly after start + detach: 631MB resident set size
  • 1000x zellij ls: 655MB
  • 5000x: 751 MB

This works out to 24kB leaked for each client / attach.

@imsnif
Copy link
Member

imsnif commented Mar 24, 2023

@raphCode - these might not strictly be leaks, I think @tlinford has done some digging into the rust allocator and might have a relevant link handy?

@tlinford
Copy link
Contributor

@raphCode - these might not strictly be leaks, I think @tlinford has done some digging into the rust allocator and might have a relevant link handy?

Here is what I found about allocators in another issue: #1625 (comment)

@tlinford
Copy link
Contributor

Btw i pushed the branch with jemalloc as allocator here: https://github.com/zellij-org/zellij/tree/jemalloc

@raphCode
Copy link
Contributor Author

I am aware of the allocator playing a role, but this case looks like a leak to me:
Memory rises linearly with attach count, no anomalies. I performed 5000 attach repetitions. Or 5 repetitions of the '1000x attach' case, if you want.

Tell me if any of these would be helpful:

  • rechecking with jemalloc branch
  • running it under valgrind
  • letting the test run longer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants