-
-
Notifications
You must be signed in to change notification settings - Fork 678
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
session becomes unusable after network timeout #1781
Comments
I am sorry for the experience and I understand the frustration of lost sessions. For an immediate workaround you can try to replace ssh with mosh. It keeps a ssh connection even after network issues and you don't need to reconnect. All programs keep running so zellij won't mess up either. To better understand the bug, can you give more information? What do you mean with "everything is messed when I come back"? Also, it would be helpful it you could post the logfiles in |
What I have tried today is to set "seconds between keepalives" to 10 in PuTTY in the "Connection" section and set "Enable TCP Keepalives". Nevertheless the connection to the VM breaks after some time when I am absent from my notebook. But I have tried again to use "kill -9" on the "zellij attach" process and then reattach. This has worked two times today. I will have to see whether this will always work or breaks again. In the past (maybe with older versions) it also worked sometimes but then after a while everything began to "freeze" so I used to restart everything right from the beginning to avoid such freezing issues in the middle of my work. Another option to avoid these issues I have tried was to run zellij locally within some Alpine container under WSL2 and then SSH from there. But this gives again very bad copy&paste behaviour (this time not due to zellij but due to this WSL2 shell environment) so I went back to the solution to SSH some Linux VM and run zellij from there. |
One part of the problem is the way Windows behaves in sleep mode. I have made several tests and found that SSH connections break when you put Windows to sleep for IPv6. This is the message I get:
I have made tests to different machines with IPv4 and IPv6. SSH breaks only for IPv6. For example logging in with IPv4 and IPv6 in parallel. After coming back from sleep mode the IPv4 connection is still there while the IPv6 connection is broken. It behaves same in PuTTY or PowerShell. There is an option in the device manager for the interface ("Power Mangement") called "Allow the computer to turn off this device to save power". This is enabled by default. I have tried to disable this but this has no effect. Of course this is not the fault of zellij. But so far I don't know why Windows behaves this way. |
@fansari are you using wifi? If so, check to be sure that your wifi adapter does not go to sleep when you walk away... |
My notebook is connected via cable to a switch. It is still unclear to me why the Windows sleep mode kills IPv6 while IPv4 survies. |
Could it be that tcp connection over IPv6 have a different timeout / close logic? Please describe what the situation is after the ssh connection breaks: |
I must admit I have avoided this because I had bad experiences. Sometimes I can reattach but then later things begin to freeze and if this is in the middle of something important it is not what I like. Sometimes I see the process with the "zellij attach" after reconnecting and sometime only the server is stil running. When I saw the "zellij attach" session I kill it because from my experience when I don't do this and try to attach it direcly hangs. When I kill the "zellij attach" session as far as I remember sometimes this also kills the server process. When the server process survives and the attach process got killed I sometimes reattached and it worked for a while. But as I said: this does not last long and in the moment you at least expect this the whole thing is frozen and you can start all over again. So for the last weeks whenever I lost connection I directly killed all of zellij. And since there is no answer from the Microsoft community so far there are only two options left: eitehr don't work with IPv6 or disable sleep mode. Or a thrid option if you have this possibilty: don't work with Windows. |
I am sorry to see that the current zellij behavior is not suitable for doing actual work in your situation. But the only way of us having a chance at fixing it is to understand the problem. Right now I am not even sure I correctly understand the error situation. You mention "freezing" and "hanging". My interpretation of these words:
Does this correspond to your understanding of these terms as well? Maybe you can start a sacrificial zellij session next to your working session where you can observe the behavior and records logs from |
With "freezing" you are right. With "hanging" I am not absolutely sure how it behaved. I think it did not open at all. Current workaround for me was to disable the sleep mode in Windows. This way my IPv6 connection is not disconncted when I am away from the keyboard more than a few minutes. There is still no answer in the Microsoft forum why this happens. |
Yeah, "hanging" includes that nothing opens and the only way out is Regarding the freezing, at first it seemed to me that this is related to network packets not getting delivered, buf after re-reading #1588 #1209 and this issue, I don't think it is that. |
After I had a network drop I saw the "zellij --server and "zellij attach" both in the process list. I did "zellij attach" without killing the existing "zellij attach". After this I was able to get my session back. After a few seconds the keyboard did not react anymore. |
Thanks for the log file! :) @imsnif Maybe you are interested in the log, although I can't read anything too useful from it in the described time range. Maybe the previous part of the log gives a hint, I see quite some stuff going on there. |
The issue actually was right in front of us all the time :) Based on my finding today about the bad handling of a killed client (#1949) I suspected that we have more problems with clients misbehaving or not reacting.
If there are no applications producing output since the network problem occured, the behavior is particularly devilish: One can attach a second client and it works for a bit, then everything freezes when the buffers filled up. I can see why this is frustrating enough to kill everything and start over. With this theory it's quite easy to reproduce:
The first zellij session is still useable for some time (or should I say... for some terminal bytes? ;) ) until the write buffers on the suspended client fill up, freezing everything. Hey @imsnif, your turn now ;) |
Fantastic analysis @raphCode !! I'm reproducing this and found the issue. Now mostly trying to figure out the best way to solve it. Will keep this thread posted. |
I connect to a VM via PuTTY. There I open a zellij session.
If the PuTTY session has a network timeout and looses the connection the zellij session becomes unusable.
Only workaround so far for me is to close the session before I leave my notebook (timeout typically happens when I am leave for more than just a few minutes). Then I can reconnect via PuTTY and attach zellij again. But when I don't detach from the zellij session everything is messed when I come back.
I think this or something similar was reported here several times but it is still not fixed.
Tested with v0.31.4.
The text was updated successfully, but these errors were encountered: