Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash when resuming or creating a new session #3657

Open
kahilah opened this issue Oct 9, 2024 · 11 comments
Open

Crash when resuming or creating a new session #3657

kahilah opened this issue Oct 9, 2024 · 11 comments
Assignees

Comments

@kahilah
Copy link

kahilah commented Oct 9, 2024

  1. Issues with the Zellij UI / behavior / crash

Basic information

OS: centos 8
terminal: gnome-terminal
zellij --version: 0.40.1

Issue description

I have been using zellij for a week now (with default settings, no plugins etc.) and started experiencing these crashes quite quickly. Common for each crash has been that they happened when either 1) resuming a session or 2) creating a new session.

Zellij has been running for several hours whenever this crash happens. No other common factors has been identified.

Crash results in following error message

  × Thread 'wasm' panicked.
  ├─▶ Originating Thread(s)
  │   	1. main_thread: SwitchSession
  │   	2. ipc_server: NewClient
  │   	3. screen_thread: NewTab
  │   	4. plugin_thread: NewTab
  │   
  ├─▶ At .cargo/registry/src/index.crates.io-6f17d22bba15001f/async-global-executor-2.3.1/src/init.rs:39:18
  ╰─▶ cannot spawn executor threads: Os { code: 11, kind: WouldBlock, message: "Resource temporarily unavailable" }
  help: If you are seeing this message, it means that something went wrong.
        
        -> To get additional information, check the log at: /tmp/zellij-43927/zellij-log/zellij.log
        -> To see a backtrace next time, reproduce the error with: RUST_BACKTRACE=1 zellij [...]
        -> To help us fix this, please open an issue: https://github.com/zellij-org/zellij/issues

And tmp file is filled with equivalent messages to this:
ERROR |zellij_server::background| 2024-10-09 21:55:09.611 [async-std/runti] [.cargo/registry/src/index.crates.io-6f17d22bba15001f/zellij-server-0.40.1/src/background_jobs.rs:443]: Failed to read created stamp of resurrection file: Error { kind: Unsupported, message: "creation time is not available for the filesystem" }

Minimal reproduction

Haven't been able to reproduce in deterministic manner.

Other relevant information

Error message has been similar whether the crash happens when opening an old session or creating a new one.

Restarting zellij results in the same error message and I need to kill zellij processes to enable restart.

@imsnif imsnif self-assigned this Oct 10, 2024
@kahilah
Copy link
Author

kahilah commented Oct 26, 2024

I'd like to add a comment that during the past 2 weeks I have been able to mitigate crashes by keeping the number of sessions minimal 2-5 and actively deleting sessions that I haven't touched for days.

@imsnif
Copy link
Member

imsnif commented Oct 31, 2024

Hey @kahilah - I looked a little bit into this and can't see an immediate cause from these details. Seems like for some reason the async executor can't spawn more threads.

Combined with the logs you provided regarding reading the creation time, a wild guess on my part is that this involves a problem with the generic musl binary. Did you install Zellij in this way (eg. with the Try Zellij before Installing method)? No harm in it, of course - it should work as expected.

If so, would you be willing to try compiling it for your own system (eg. with cargo install --locked zellij)? It might help identify the issue.

@tgulacsi
Copy link

tgulacsi commented Nov 3, 2024

What's your "uname -a"? May be something limits the number of threads/file descriptors?

@kahilah
Copy link
Author

kahilah commented Nov 3, 2024

Hi, thanks for the suggestions. So my installation has been always via compilation with cargo so that shouldn't be the problem. Kernel info on this particular machine via uname shows: Linux XXXX 4.18.0-348.23.1.el8_5.x86_64 #1 SMP Wed Apr 27 15:32:52 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

@tgulacsi
Copy link

tgulacsi commented Nov 4, 2024

Sorry, I've meant "ulimit -a" ...

@kahilah
Copy link
Author

kahilah commented Nov 5, 2024

Ah I see. The ulimit shows the following:

core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 8204480
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 100000
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) unlimited
cpu time               (seconds, -t) unlimited
max user processes              (-u) 4096
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

@kahilah
Copy link
Author

kahilah commented Nov 25, 2024

As an additional information to this:

  1. I have been using single session continuously over 3 weeks without crashes. (i.e. single session is very robust).
  2. I closed this long session and noticed that it's latest state was not saved to attachable sessions.
  3. I had few other very old (3 weeks or so) sessions there so tried to switch into them: few worked, but one caused zellij to crash with equivalent error message to one presented in the first message above.

After this crash (3) I cannot open zellij as it continues giving the same error. I need to kill existing zellij processes and then it works.

@kahilah
Copy link
Author

kahilah commented Nov 25, 2024

Another note, I just realised that if I try to remove everything from .cache/zellij, the 0.40.1/session_info directory remains. Actually, removing individual old session files there are regenerated immediately after removal. I have killed all zellij processes manually but this still happens. Sounds to me that some rogue process doing this?

@kahilah
Copy link
Author

kahilah commented Nov 25, 2024

Another note, I just realised that if I try to remove everything from .cache/zellij, the 0.40.1/session_info directory remains. Actually, removing individual old session files there are regenerated immediately after removal. I have killed all zellij processes manually but this still happens. Sounds to me that some rogue process doing this?

Note to myself, after system logout-login refresh, ps -fC found few more zellij processes which required killing. After that, removing cache was succesfull.

I'll update to latest release version and keep experimenting.

@imsnif
Copy link
Member

imsnif commented Nov 26, 2024

Hey @kahilah - it's a little hard for me to keep track at this stage. Any chance for a summary of your findings?

@kahilah
Copy link
Author

kahilah commented Nov 27, 2024

Sorry for the convoluted issue but as a summary:

With version 0.40.1 I experienced crashes with the error message shown in the first message when switching between sessions or when resuming a sessions which have been created several hours / days ago. Crash happens at the moment switching happens.

I upgraded to latest version two days ago and no crashes yet, but my session usage has been limited.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants