Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The spawned X server stays alive when xpra killed before a display number is assigned #2032

Closed
totaam opened this issue Nov 6, 2018 · 21 comments
Labels

Comments

@totaam
Copy link
Collaborator

totaam commented Nov 6, 2018

Issue migrated from trac ticket # 2032

component: server | priority: major | resolution: fixed

2018-11-06 08:50:24: berserker created the issue


The reason is that the function killing the X server is added to the Xpra exit cleanup list only after the display number is assigned, see
[/browser/xpra/trunk/src/xpra/scripts/server.py?rev=20254#L774] and
[/browser/xpra/trunk/src/xpra/scripts/server.py?rev=20254#L818].

@totaam
Copy link
Collaborator Author

totaam commented Nov 6, 2018

2018-11-06 14:22:51: antoine changed priority from critical to major

@totaam
Copy link
Collaborator Author

totaam commented Nov 6, 2018

2018-11-06 14:22:51: antoine changed status from new to assigned

@totaam
Copy link
Collaborator Author

totaam commented Nov 6, 2018

2018-11-06 14:22:51: antoine edited the issue description

@totaam
Copy link
Collaborator Author

totaam commented Nov 6, 2018

2018-11-06 14:22:51: antoine commented


It's not easy to change this, the kill_display function is also used for display upgrades.

I don't see how this bug is critical: don't kill xpra shortly after starting it and this won't be an issue.

@totaam
Copy link
Collaborator Author

totaam commented Nov 6, 2018

2018-11-06 14:29:54: berserker commented


I don't see how this bug is critical

When starting Xpra, sporadically I get the following error:

xpra initialization error:
xpra_Xdummy: did not provide a display number using displayfd

Terminating Xpra does not terminate X server in this case! And you "don't see how this is critical"...

@totaam
Copy link
Collaborator Author

totaam commented Nov 6, 2018

2018-11-06 15:09:49: antoine commented


xpra_Xdummy: did not provide a display number using displayfd
This can happen if your system is really slow.
You can try to change the default timeout:

XPRA_DISPLAY_FD_TIMEOUT=30 xpra start ...

@totaam
Copy link
Collaborator Author

totaam commented Nov 6, 2018

2018-11-06 16:56:32: berserker commented


This can happen if your system is really slow.

Well, kinda. But the question really is why it lets Xorg to continue running after the error occurs?

@totaam
Copy link
Collaborator Author

totaam commented Nov 12, 2018

2018-11-12 11:38:25: antoine changed status from assigned to new

@totaam
Copy link
Collaborator Author

totaam commented Nov 12, 2018

2018-11-12 11:38:25: antoine changed owner from antoine to berserker

@totaam
Copy link
Collaborator Author

totaam commented Nov 12, 2018

2018-11-12 11:38:25: antoine commented


Well, kinda.
what other possibility is there that I am missing?

But the question really is why it lets Xorg to continue running after the error occurs?
Because no-one hit that bug until now?

Fixed for me in r20968. Tested with an unreasonably low timeout value:

XPRA_DISPLAY_FD_TIMEOUT=1 python2 /usr/bin/xpra start --no-daemon -d x11

@berserker: please close if that works for you.

@totaam
Copy link
Collaborator Author

totaam commented Nov 13, 2018

2018-11-13 23:13:04: berserker changed owner from berserker to Antoine Martin

@totaam
Copy link
Collaborator Author

totaam commented Nov 13, 2018

2018-11-13 23:13:04: berserker commented


what other possibility is there that I am missing?

Nothing. The system is indeed slow and on top of that users have CPU limits. This makes it easy to trigger the timeout if try to start several Xpra servers simultaneously.

please close if that works for you

Ehm... r20968 is not a fix for the bug reported, even if it solves the problem with timeouts (not checked yet).

@totaam
Copy link
Collaborator Author

totaam commented Nov 14, 2018

2018-11-14 05:07:06: antoine changed owner from Antoine Martin to berserker

@totaam
Copy link
Collaborator Author

totaam commented Nov 14, 2018

2018-11-14 05:07:06: antoine commented


Ehm... r20968 is not a fix for the bug reported
It is meant to be.

even if it solves the problem with timeouts (not checked yet).
Timeouts are the only unhandled error path I could find that could cause the cleanups to be skipped.

If you can still reproduce a problem, please post the full output of:

XPRA_ALL_DEBUG=1 xpra start ....

@totaam
Copy link
Collaborator Author

totaam commented Nov 14, 2018

2018-11-14 10:12:22: berserker changed owner from berserker to Antoine Martin

@totaam
Copy link
Collaborator Author

totaam commented Nov 14, 2018

2018-11-14 10:12:22: berserker commented


Replying to [comment:7 Antoine Martin]:

Timeouts are the only unhandled error path I could find that could cause the cleanups to be skipped.

There is also Ctrl+C.

@totaam
Copy link
Collaborator Author

totaam commented Nov 22, 2018

2018-11-22 09:26:32: antoine changed owner from Antoine Martin to berserker

@totaam
Copy link
Collaborator Author

totaam commented Nov 22, 2018

2018-11-22 09:26:32: antoine commented


There is also Ctrl+C.
Right. That is only a problem now because of a recent bug somewhere in python's atexit: #1943 (this used to work fine before, no idea when / what library broke it)

  • r21065 is yet another workaround for atexit
  • r21066 ensures we kill the vfb if the start_Xvfb terminates abnormally

@totaam
Copy link
Collaborator Author

totaam commented Nov 30, 2018

2018-11-30 10:21:39: antoine changed status from new to closed

@totaam
Copy link
Collaborator Author

totaam commented Nov 30, 2018

2018-11-30 10:21:39: antoine set resolution to fixed

@totaam totaam closed this as completed Nov 30, 2018
@totaam
Copy link
Collaborator Author

totaam commented Dec 27, 2018

2018-12-27 20:43:55: antoine commented


See also: #2091 which explains why we now hit the timeout more often than before.

@totaam totaam added the v2.4.x label Jan 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant