-
-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Server hangs in XSync()
#3503
Comments
Please specify more details about your environment, versions, etc. As per: I see this in your backtraces which tells me that this is not a standard setup: It would help to have debug symbols and to know where in the python event loop code it is failing (not the cython
I doubt this is the same problem as #475, but you could always try: |
/opt/xpra/usr/bin/xpra start --daemon=no \
--chdir=/home/mdavidsaver --start=/usr/local/bin/perpetual-xterm \
--terminate-children=yes --mdns=no \
--bind-tcp=0.0.0.0:14500 --tcp-auth=sys \
:10 I'm seeing this issue in conjunction with the html5 client. I have a group of ~20 users, and only 4 report hangs. Though each of these has had multiple occurrences. These users run a variety of browsers (Safari, Firefox, Chrome), and I don't yet see any commonality. The first symptom is that the xpra server process "freezes". eg. I then see that new http connections are not My searches for "xsync hang" and "gdb_flush hang" have not been helpful. Reading the man page for I guess I can get stack traces from the Could this be triggered by a misbehaving X client application? I'm working with a java/openjfx application, which I know to be troublesome wrt. gtk usage. I'm using xpra is part because the combination seems to have the fewest glitches. So I'm not sure if a stack trace would show if eg. some client application has grabbed the server.
I'm running a local build of the git revisions mentions above against debian packaged dependencies. The only local change is to
I'm planning to rebuild xpra, passing
I concur. I linked that issue because it is the only other mention of |
Ah. Those are notoriously flaky.
It would not - it would look exactly the same as what we have here. |
I had one more occurrence, from which I am able to collect a little more information. I am able to leave things running in the hung state for the time being, so I could perform additional postmortem tests if any come to mind. I was able to capture stack traces of all processes associated (by systemd) with this xpra instance. Unfortunately, while I did install some Debian debug info packages, it looks like I didn't point to a debug build of xpra (oops...). This may be moot, as the Xvfb process appears to be idling normally. I also don't see anything abnormal in the 4 (of 71) threads in the java/jfx application making glib/gtk calls. (I'll continue looking at the java process as there is a reasonable chance I'm missing something) I also checked (with netstat and ss) the state of the various socket buffers. The TX/RX queues for all of the unix domain connections are empty, including the X related ones. This is consistent with Xvfb idling normally. (maybe it could be inspected by some X client?)
The TCP connection queues are not, which is as expected with the GIL being locked for the
Also, it looks like we're running Sun JDK 11.0.2 atm. Which of course has no debug symbols... openjdk 17.0.2 is also install, and I thought this was being used. sigh... maybe next time. Finally, it is unlikely I'll be able to trigger this hang again in the near term. I haven't been able to do so myself, and the event which provided additional users (a training class) has ended. My suspicion atm. is that the xpra hang is somehow a side effect of misbehavior by OpenJFX. As you say, gtk support in jfx is notoriously buggy. (I've looked at the gtk2/3 binding code for both openjfx and SWT, and both are nightmarish rats nests!) So this ticket could be closed if, as I expect, nothing further can be learned from the information I have provided. |
I was wrong when I said:
As per my previous comment: #3503 (comment) Without that, I can only suggest running with: XPRA_X11_DEBUG_EVENTS=all xpra start ... Which is going to generate a huge amount of debug logging but may show us the event that's triggering the bug. |
wrt. X server locking. Is there some way I can probe this without restarting the Xvfb process? How complete would this lockout be? eg. could something like
Sorry, I didn't pick up on this. The full gdk_bindings.c. The first comment above /* "xpra/x11/gtk3/gdk_bindings.pyx":1035
* elif etype == PropertyNotify:
* pyev.window = _gw(d, e.xany.window)
* pyev.atom = trap.call_synced(_get_pyatom, d, e.xproperty.atom) # <<<<<<<<<<<<<<
* pyev.time = e.xproperty.time
* elif etype == ConfigureNotify:
*/ |
also make things consistent and always use an X11 trap sync context so that X11 BadAtom errors will be caught here
Yes.
Ah, now that is interesting! The |
Feel free to re-open if you can still reproduce with 1e56be6 or later. |
Describe the bug
I'm seeing instances of a server hanging in
XSync()
. I don't yet have a useful stack trace with debug symbols. What I have is attached below, and looks somewhat similar to #475.To Reproduce
tbd. I'm not yet sure how to trigger this issue. I'm going to rebuild with debug information and wait for another occurance. Other suggestions for troubleshooting are very welcomed.
System Information (please complete the following information):
Additional context
Add any other context about the problem here.
Please see "reporting bugs" in the wiki section.
instance1.txt
instance2.txt
The text was updated successfully, but these errors were encountered: