Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Project or editor crashes randomly from xcb XInitThreads #75308

Open
Cykyrios opened this issue Mar 25, 2023 · 50 comments
Open

Project or editor crashes randomly from xcb XInitThreads #75308

Cykyrios opened this issue Mar 25, 2023 · 50 comments

Comments

@Cykyrios
Copy link
Contributor

Godot version

v4.1.dev.custom_build [0291fcd]

System information

Linux Manjaro, kernel 6.1.19, X11

Issue description

The editor or the running project sometimes crashes with the following error:

[xcb] Unknown sequence number while processing queue
[xcb] You called XInitThreads, this is not your fault
[xcb] Aborting, sorry about that.
godot.linuxbsd.editor.x86_64: xcb_io.c:278: poll_for_event: Assertion `!xcb_xlib_threads_sequence_lost' failed.

Crashes are more common while a project is running, but the editor also crashed because of this a couple of times over the past week or so.

I am not using any thread-related functions in my project, physics/rendering are not threaded, the project I'm working on as this happens is a simple GUI-based game.

Steps to reproduce

This seems to happen fairly randomly.

Minimal reproduction project

N/A

@Calinou
Copy link
Member

Calinou commented Mar 26, 2023

I haven't been able to reproduce this so far on Fedora 37 KDE (GeForce RTX 4090 with NVIDIA 525.89.02).

What graphics card model, driver version and desktop environment are you using?

Edit: As of November 2023, I've started to be able to reproduce this issue on the same setup as mentioned above (with Fedora 38 and then 39).

@Cykyrios
Copy link
Contributor Author

Oh right, I forgot about GPU-related info. I have an AMD 7900 XT, running on the open-source amdgpu drivers with Mesa 22.3.5 (amdgpu version is "kernel").
The desktop is Plasma 5.26.5.

@geowarin
Copy link
Contributor

Saw this happening (only once) on a totally different config:
RTX 2080Ti
archlinux
i3wm

@cg9999
Copy link
Contributor

cg9999 commented Mar 28, 2023

I have this too, again. Seems very reminiscent of #69352
Happens quite frequently here.
The message varies a bit, last one I got is:

[xcb] Unknown request in queue while dequeuing
[xcb] You called XInitThreads, this is not your fault
[xcb] Aborting, sorry about that.
godot: xcb_io.c:175: dequeue_pending_request: Assertion `!xcb_xlib_unknown_req_in_deq' failed.

Godot: v4.0.1.stable.arch_linux
libx11: 1.8.4-1
arch linux 64 bit, kernel 6.2.8-zen1
On a laptop with Intel HD Graphics 620

@DrRevert
Copy link
Contributor

DrRevert commented Apr 8, 2023

Manjaro kernel version: 6.1.22-1
As for GPUs: AMD RX 6800 XT but also Intel UHD Graphics 770 (CPU i7-12700)
I have my screens connected to integrated graphics as I'm doing some GPU passthrough, this setup used to cause issues during the beta whenever I opened a new window or a submenu.
Happened like 4 times mostly randomly when the editor was idling.

Managed to reproduce it when connected to gdb, adding backtrace as the attachment
gdb.txt

Forgot to mention Godot version: custom build based on 4.0.2 stable

@Eraph
Copy link

Eraph commented Apr 9, 2023

Seeing the same thing on Manjaro here, I have integrated Intel graphics (Intel i7-1165G7). Gnome on Wayland. Common factor seems to be Arch based distros?

Full backtrace:

handle_crash: Program crashed with signal 11
Engine version: Godot Engine v4.0.2.stable.mono.official (7a0977ce2c558fe6219f0a14f8bd4d05aea8f019)
Dumping the backtrace. Please include this when reporting the bug to the project developer.
[1] /usr/share/dotnet/shared/Microsoft.NETCore.App/7.0.3/libcoreclr.so(+0x4a90a4) [0x7f5d0658b0a4] (??:0)
[2] /usr/lib/libc.so.6(+0x38f50) [0x7f5d35225f50] (??:0)
[3] /usr/share/dotnet/shared/Microsoft.NETCore.App/7.0.3/libcoreclr.so(+0x49296b) [0x7f5d0657496b] (??:0)
[4] /usr/share/dotnet/shared/Microsoft.NETCore.App/7.0.3/libcoreclr.so(+0x4a8d18) [0x7f5d0658ad18] (??:0)
[5] /usr/lib/libc.so.6(+0x38f50) [0x7f5d35225f50] (??:0)
[6] /usr/lib/libc.so.6(+0x878ec) [0x7f5d352748ec] (??:0)
[7] /usr/lib/libc.so.6(gsignal+0x18) [0x7f5d35225ea8] (??:0)
[8] /usr/lib/libc.so.6(abort+0xd7) [0x7f5d3520f53d] (??:0)
[9] /usr/lib/libc.so.6(+0x2245c) [0x7f5d3520f45c] (??:0)
[10] /usr/lib/libc.so.6(+0x319f6) [0x7f5d3521e9f6] (??:0)
[11] /usr/lib/libX11.so.6(+0x3eb8f) [0x7f5d2d888b8f] (??:0)
[12] /usr/lib/libX11.so.6(+0x41995) [0x7f5d2d88b995] (??:0)
[13] /usr/lib/libX11.so.6(_XEventsQueued+0x62) [0x7f5d2d88e642] (??:0)
[14] /usr/lib/libX11.so.6(XFlush+0x1f) [0x7f5d2d86bc1f] (??:0)
[15] /opt/godot-mono-bin/godot/Godot_v4.0.2-stable_mono_linux.x86_64() [0x4d62051] (??:0)
[16] /opt/godot-mono-bin/godot/Godot_v4.0.2-stable_mono_linux.x86_64() [0xe792eb] (??:0)
[17] /opt/godot-mono-bin/godot/Godot_v4.0.2-stable_mono_linux.x86_64() [0x4217f35] (??:0)
[18] /opt/godot-mono-bin/godot/Godot_v4.0.2-stable_mono_linux.x86_64() [0x4e38160] (??:0)
[19] /usr/lib/libc.so.6(+0x85bb5) [0x7f5d35272bb5] (??:0)
[20] /usr/lib/libc.so.6(+0x107d90) [0x7f5d352f4d90] (??:0)
-- END OF BACKTRACE --

@vypxl
Copy link

vypxl commented Apr 12, 2023

Having the same issue, Manjaro with Hyprland / Wayland here. Also Godot 4.0.1 stable, libx11 v1.8.4-1, intel integrated graphics.

@ilmagico
Copy link

Can confirm on Manjaro with kernel 6.1.23, X11 (no wayland) with libx11 1.8.4-1 as well, intel integrated, godot 4.1 compiled from source (from a fork not far from master, but judging from this report the issue is in godot, I could confirm if necessary), backtrace is exactly the same as @Eraph above.

Also, I noticed this is with .NET 7.0.3, while if I download the official stable godot mono from godotengine.org (not from Manjaro's pacman) it never crashes this way, and it's on .NET6, not sure if it matters.

Any other info I could provide to help debug this?

@akien-mga akien-mga added this to the 4.1 milestone Apr 20, 2023
@ju5tevg3niy
Copy link
Contributor

ju5tevg3niy commented Apr 23, 2023

Same issue.

godot:  4.0.2.stable.official.7a0977ce2
render: Vulkan API 1.3.230 - Forward Mobile - Using Vulkan Device #0: Intel - Intel(R) HD Graphics 620 (KBL GT2)
os:     Gentoo
kernel: 6.1.22
de:     Xfce 4.18 / X11
libX11: 1.8.4-r1

@comminux
Copy link

comminux commented Jun 5, 2023

It seems that the problem no longer occurs in version 1.8.5 (Arch Linux official extra repository).

UPD. The problem appears again on libX11 1.8.7

@jivvy
Copy link

jivvy commented Jul 23, 2023

Just had this happen

swaywm
Arch Linux
AMD 6700XT
Godot v4.2.dev.custom_build [f8dbed4]
libx11 1.8.6-1

[xcb] Unknown request in queue while dequeuing
[xcb] You called XInitThreads, this is not your fault
[xcb] Aborting, sorry about that.
godot.linuxbsd.editor.x86_64: xcb_io.c:175: dequeue_pending_request: Assertion `!xcb_xlib_unknown_req_in_deq' failed.

@krendil
Copy link

krendil commented Oct 4, 2023

I'm getting the same error, with a recent libX11 and non-Arch Linux

Godot Engine v4.1.1.stable.custom_build
OpenGL API 4.6 (Core Profile) Mesa 23.1.3 - Compatibility - Using Device: AMD - AMD Radeon RX 6600 (navi23, LLVM 15.0.7, DRM 3.52, 6.3.13_1)

Void Linux
XFCE4 / xfwm 4.18.0_1
libX11 1.8.6_1
libxcb 1.16_1

[xcb] Unknown request in queue while dequeuing
[xcb] You called XInitThreads, this is not your fault
[xcb] Aborting, sorry about that.
godot: xcb_io.c:175: dequeue_pending_request: Assertion `!xcb_xlib_unknown_req_in_deq' failed.

@Leshy-YA
Copy link

Leshy-YA commented Oct 6, 2023

Confirmed on Fedora running KDE with Mesa Intel® Xe Graphics.
It would seem there's a regression in libX11 1.8.7, probably related to https://gitlab.freedesktop.org/xorg/lib/libx11/-/issues/170
Downgrading libX11 to 1.8.4 removes the issue.

@ZwieBit
Copy link

ZwieBit commented Oct 18, 2023

Can also confirm that a downgrade to libX11 1.8.4 fixed the issue. I already thought that godot is somewhat unstable but now even 4.2 beta1 works like a charm :-)

@Pshy0
Copy link

Pshy0 commented Oct 19, 2023

I am having similar crashes involving xcb_in.c. They are unpredictable, sometimes crashing the project, sometimes crashing the editor, sometime just displaying in logs without a crash.

Ubuntu 23.04
Godot 4.2.dev4.official.549fcce5f
libx11 1.8.4-2ubuntu0.3
libxcb 1.15-1

The error messages are as follow:

[xcb] Unknown request in queue while dequeuing
[xcb] You called XInitThreads, this is not your fault
[xcb] Aborting, sorry about that.
godot: ../../src/xcb_io.c:175: dequeue_pending_request: Assertion `!xcb_xlib_unknown_req_in_deq' failed.

or

[xcb] Unknown sequence number while awaiting reply
[xcb] You called XInitThreads, this is not your fault
[xcb] Aborting, sorry about that.
godot: ../../src/xcb_io.c:374: poll_for_response: Assertion `!xcb_xlib_threads_sequence_lost' failed.

or

godot: ../../src/xcb_in.c:757: xcb_request_check: Assertion `!reply' failed.

It also sometimes crashes without an error message.

@YuriSizov YuriSizov modified the milestones: 4.2, 4.3 Nov 15, 2023
@vvvvvvitor
Copy link

Keeps happening here all the time, it makes editor basically unusable due to how often it happens. It's really frustrating to the point of me not wanting to work on my project anymore.

[xcb] Unknown sequence number while awaiting reply [xcb] You called XInitThreads, this is not your fault [xcb] Aborting, sorry about that. 4.1.3.x86_64: xcb_io.c:374: poll_for_response: Assertion !xcb_xlib_threads_sequence_lost' failed.
`

@ygingras
Copy link

ygingras commented Nov 16, 2023

On Ubuntu 23.04, all of Godot 4.1.1, 4.1.2, 4.1.3 crash about three times per hour. If I downgrade xserver-xorg-core from 2:21.1.7-1ubuntu3.1 to 2:21.1.7-1ubuntu3, then the crashes happen only once every few days.

Edit: fixed the version numbers

@Lamby777
Copy link

@Lamby777 Upgrading libx11 works, too. I compiled and installed the latest libx11 from master on fedora 39 by doing

https://gitlab.freedesktop.org/xorg/lib/libx11/-/tree/master

./autogen.sh ./configure --prefix=/usr make sudo make install

and then reboot and I haven't had a crash yet.

it's been annoying me so much that i finally decided to go looking for this thread again... :P

sadly, it doesn't work :(

Not that compiling your own version doesn't work; that I don't know. The actual compiling part doesn't work. I tried ./autogen.sh and it was giving some error about xorg macros not being installed so i installed this package called xorg-util-macros and now it's complaining about some macro XTRANS_CONNECTION_FLAGS being possibly undefined... Is the macro package I installed just outdated? Cuz i just pulled libx11 source from master so maybe they changed some macros that haven't been put onto arch repos yet. Is that even the right package to install? Seems to be, since the error's gone, but idk

@Lamby777
Copy link

At least the error message apologizes, which I found somewhat amusing.

@imaducklol
Copy link

I'll add another data point I guess, got the same crash on two different machines:
Godot 4.2.1, Fedora 39 (Gnome 45 on Wayland), libX11 1.8.7, i3-12100F, 5700xt, (Running godot under xwayland)
Godot 4.2.1, Arch (Gnome 46 on Xorg), libX11 1.8.9, i7-1185G7, Iris Xe, (Running godot under xorg)

Happened on both machines at around twice per hour. Wasn't able to pick up on anything specifically that caused them.
I'll report back if I compile libX11 from source and that fixes anything.

@Tichau
Copy link

Tichau commented May 28, 2024

Don't know if it's useful, but here is a new repro:

[xcb] Unknown request in queue while dequeuing
[xcb] You called XInitThreads, this is not your fault
[xcb] Aborting, sorry about that.
TheGuild: ../../src/xcb_io.c:175: dequeue_pending_request: Assertion `!xcb_xlib_unknown_req_in_deq' failed.
[1] 631819 segmentation fault ./Path/To/Exe

Debian 12
Gnome 43.9
X.Org version: 1.22.1.9

If any other info is needed, I can edit this post.

@ttencate
Copy link
Contributor

// NOTE: Generated from Xlib 1.6.9.

That version is almost five years old. Is it possible that we're looking at some ABI incompatibility here?

@akien-mga
Copy link
Member

// NOTE: Generated from Xlib 1.6.9.

That version is almost five years old. Is it possible that we're looking at some ABI incompatibility here?

That's a good question.

This hypothesis could be tested by someone who can reproduce the issue reliably, by making a custom build with scons use_sowrap=no, which will disable the dynamic library wrappers and link the system libraries instead. To compile successfully, you might need to install more dev libraries (the ones from https://docs.godotengine.org/en/latest/contributing/development/compiling/compiling_for_linuxbsd.html#distro-specific-one-liners, which wasn't updated now that we default to dlopen'ing these deps).

@ttencate
Copy link
Contributor

Before I read your comment, I tried another track: I replaced the vendored Xlib.h, XKBlib.h and Xutil.h by the ones from my system (Arch Linux, libx11 1.8.9), and re-ran the generator (version cb59cc4fc69a3f05aed6ca6fa998a934788794f4, which is the first one marked as "0.3" in the source) as instructed in the header. The differences are only additions and one replacement of a char* argument by const char* in XkbOpenDisplay. It still crashes.

I can reproduce it fairly reliably at the moment: my game usually crashes within tens of seconds. The editor fares better, but also crashes about once an hour or so.

With use_sowrap=no, it initially seemed a bit better, but after a few minutes it also crashed.

For the record, here are the commands I used to build (it gets simpler without mono):

$ git checkout 4.2.2-stable
$ scons platform=linuxbsd target=editor arch=x86_64 module_mono_enabled=yes use_sowrap=no
$ bin/godot.linuxbsd.editor.x86_64.mono --headless --generate-mono-glue modules/mono/glue
$ ./modules/mono/build_scripts/build_assemblies.py --godot-output-dir=./bin

@ttencate
Copy link
Contributor

ttencate commented May 29, 2024

Summarizing the reports above:

  • libx11 1.8.2 - OK (unless patched)
  • libx11 1.8.3 - broken
  • libx11 1.8.4 - mixed
  • libx11 1.8.5 - OK (but see below)
  • libx11 1.8.6 - broken
  • libx11 1.8.7 - broken
  • libx11 1.8.8 - unknown
  • libx11 1.8.9 - broken

The only difference between 1.8.5 and 1.8.6 is 304a654, which seems unrelated to me. So I'm inclined to assume that there was only one breakage, not two, and 1.8.5 is broken as well.

I tried rebuilding the Arch package from the official PKGBUILD. Even with this, I could not trigger the crash! So for Arch users, this is a local workaround. After reinstalling the official binary package, I got a crash within a minute or two.

The two libX11.so.6.4.0 files are indeed different, but I can't tell if the differences are meaningful. Addresses and orders are different, but the list of exported symbols is the same. The two libX11-xcb.so.1.0.0 files are the same size (13976 bytes), and the diff is small:

--- official-xcb.hex	2024-05-29 12:23:39.711349095 +0200
+++ mine-xcb.hex	2024-05-29 12:23:45.944791641 +0200
@@ -45,8 +45,8 @@
 000002c0: 0300 0000 0000 0000 0100 01c0 0400 0000  ................
 000002d0: 0100 0000 0000 0000 0200 01c0 0400 0000  ................
 000002e0: 0000 0000 0000 0000 0400 0000 1400 0000  ................
-000002f0: 0300 0000 474e 5500 cf41 1b11 8b82 2d5d  ....GNU..A....-]
-00000300: ad43 7b2d 9bad ab79 884a bc70 0000 0000  .C{-...y.J.p....
+000002f0: 0300 0000 474e 5500 7e7a c198 eaf0 26ab  ....GNU.~z....&.
+00000300: 1636 87ec 7be5 6f04 a5b9 943b 0000 0000  .6..{.o....;....
 00000310: 0200 0000 0500 0000 0100 0000 0600 0000  ................
 00000320: 0000 0200 0005 0008 0500 0000 0600 0000  ................
 00000330: 6be4 cc2e 3b9a cb9a 0000 0000 0000 0000  k...;...........
@@ -767,10 +767,10 @@
 00002fe0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
 00002ff0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
 00003000: 0040 0000 0000 0000 4743 433a 2028 474e  [email protected]: (GN
-00003010: 5529 2031 332e 322e 3120 3230 3233 3038  U) 13.2.1 202308
-00003020: 3031 0000 6c69 6258 3131 2d78 6362 2e73  01..libX11-xcb.s
+00003010: 5529 2031 342e 312e 3120 3230 3234 3035  U) 14.1.1 202405
+00003020: 3037 0000 6c69 6258 3131 2d78 6362 2e73  07..libX11-xcb.s
 00003030: 6f2e 312e 302e 302e 6465 6275 6700 0000  o.1.0.0.debug...
-00003040: 164f e2c3 002e 7368 7374 7274 6162 002e  .O....shstrtab..
+00003040: fdef bdc9 002e 7368 7374 7274 6162 002e  ......shstrtab..
 00003050: 6e6f 7465 2e67 6e75 2e70 726f 7065 7274  note.gnu.propert
 00003060: 7900 2e6e 6f74 652e 676e 752e 6275 696c  y..note.gnu.buil
 00003070: 642d 6964 002e 676e 752e 6861 7368 002e  d-id..gnu.hash..

This does give a clue: apparently the official binary package was compiled with GCC 13.2.1, whereas I'm using GCC 14.1.1. This explains the differences in libX11.so.6.4.0 as well. But I don't think GCC is to blame here – it's probably just a subtle difference that causes the actual (probably thread-related) bug to manifest or not.

Not being able to reproduce this in my own build, even before adding debug information, makes this thing very hard to debug, but I'll keep trying.

@ttencate
Copy link
Contributor

I installed the gcc13 package and used it to compile libX11 again from the official PKGBUILD, but modified with CC=gcc-13 CPP=cpp-13 AR=gcc-ar-13 NM=gcc-nm-13 RANLIB=gcc-ranlib-13 before the ./configure command. (Not sure all of these are necessary or even correct; CC is the main one.) Even this didn't help to reproduce the crash.


Something I found in the core dump: at the time of the crash, there were two threads interacting with xcb. The main thread, that aborted:

...
#19 0x00007418b78c1c67 in __assert_fail (
    assertion=assertion@entry=0x7418b6d64528 "!xcb_xlib_unknown_req_in_deq", 
    file=file@entry=0x7418b6d644df "xcb_io.c", line=line@entry=175, 
    function=function@entry=0x7418b6d77310 <__PRETTY_FUNCTION__.6> "dequeue_pending_request") at assert.c:103
#20 0x00007418b6cfbcef in dequeue_pending_request (dpy=dpy@entry=0x62c40271a710, req=req@entry=0x74187000c270)
    at /usr/src/debug/libx11/libX11-1.8.9/src/xcb_io.c:175
#21 0x00007418b6cfec95 in poll_for_response (dpy=dpy@entry=0x62c40271a710)
    at /usr/src/debug/libx11/libX11-1.8.9/src/xcb_io.c:381
#22 0x00007418b6d019b2 in _XEventsQueued (dpy=0x62c40271a710, mode=<optimized out>)
    at /usr/src/debug/libx11/libX11-1.8.9/src/xcb_io.c:441
#23 0x00007418b6cdecdf in XFlush (dpy=0x62c40271a710) at /usr/src/debug/libx11/libX11-1.8.9/src/Flush.c:39
#24 0x000062c3f636ae5c in DisplayServerX11::_wait_for_events (this=this@entry=0x62c4026fbe50)
    at platform/linuxbsd/x11/display_server_x11.cpp:4048
#25 0x000062c3f636d070 in DisplayServerX11::_poll_events (this=0x62c4026fbe50)
    at platform/linuxbsd/x11/display_server_x11.cpp:4074
#26 0x000062c3f9fc3e2d in Thread::callback (p_caller_id=<optimized out>, p_settings=..., 
    p_callback=0x62c3f636d0b0 <DisplayServerX11::_poll_events_thread(void*)>, p_userdata=0x62c4026fbe50)
    at core/os/thread.cpp:61
#27 0x000062c3fa8a60e4 in execute_native_thread_routine ()
#28 0x00007418b791fded in start_thread (arg=<optimized out>) at pthread_create.c:447
#29 0x00007418b79a30dc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78

And a thread that appears to belong to AMD's Vulkan driver:

#0  0x00007418b799539d in __GI___poll (fds=fds@entry=0x74187edffae8, nfds=nfds@entry=1, 
    timeout=timeout@entry=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
#1  0x00007418b6ca420b in poll (__timeout=-1, __nfds=1, __fds=0x74187edffae8) at /usr/include/bits/poll2.h:39
#2  _xcb_conn_wait (c=c@entry=0x62c40271b9d0, vector=vector@entry=0x0, count=count@entry=0x0, 
    cond=<optimized out>) at /usr/src/debug/libxcb/libxcb-1.17.0/src/xcb_conn.c:510
#3  0x00007418b6ca629b in _xcb_conn_wait (count=0x0, vector=0x0, cond=<optimized out>, c=0x62c40271b9d0)
    at /usr/src/debug/libxcb/libxcb-1.17.0/src/xcb_conn.c:476
#4  xcb_wait_for_special_event (c=0x62c40271b9d0, se=0x62c402b86190)
    at /usr/src/debug/libxcb/libxcb-1.17.0/src/xcb_in.c:806
#5  0x00007418a31c18f0 in ?? () from /usr/lib/amdvlk64.so
#6  0x00007418a31bd495 in ?? () from /usr/lib/amdvlk64.so
#7  0x00007418a31df714 in ?? () from /usr/lib/amdvlk64.so
#8  0x00007418a322ed61 in ?? () from /usr/lib/amdvlk64.so
#9  0x00007418b791fded in start_thread (arg=<optimized out>) at pthread_create.c:447
#10 0x00007418b79a30dc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78

The latter is hanging in a poll call, so it wasn't actively racing at the time of the crash, but it's an interesting tidbit that might be a reason why Godot suffers from this bug and other applications don't. I tried with the vkcube spinning cube Vulkan demo; I couldn't get it to crash, but upon killing it with SIGQUIT (Ctrl+), this shows the same amdvlk backtrace in its coredump as well.

@ttencate
Copy link
Contributor

ttencate commented May 29, 2024

Line numbers refer to libx11 1.8.9, although the file src/xcb_io.c hasn't been touched in two years.

On line 319 in poll_for_response(), we set:

        req = dpy->xcb->pending_requests;

There is no code that modifies the req pointer in the meantime. Then, if there is actually a pending request and some other conditions hold, the pending requests is dequeued:

        dequeue_pending_request(dpy, req);

And the first thing that function does, is to fail the assertion:

    if (req != dpy->xcb->pending_requests)
        throw_thread_fail_assert("Unknown request in queue while "
                                 "dequeuing",
                                 xcb_xlib_unknown_req_in_deq);

Since req is a local variable and hasn't been changed, this must mean that dpy->xcb->pending_requests has been changed in the meantime. The culprit must have been either some invalid memory access on the same thread, or a race condition from a different thread. My money is on the latter. (It could theoretically also have been some callback that performed a reentrant libx11 call, but I don't see any place where callbacks are invoked here; also, it would imply a lack of locking somewhere, same as a threading issue.)

It should be noted that we are in an XFlush() call, which is a critical section, calling LockDisplay() at the start and UnlockDisplay() at the end. So if this is a threading issue, we'd want to look for places that modify pending_requests without issuing such a lock.

There are only two such places that matter: append_pending_request and dequeue_pending_request. So I set a conditional breakpoint in both, with the condition dpy->lock->mutex->__data->__owner == 0 (relying on some pthread internals to check if the mutex is locked). After a few minutes, the breakpoint was hit, yielding the following stack trace:

#0  dequeue_pending_request (dpy=dpy@entry=0x55555cd1fde0, req=req@entry=0x55556a1df6f0)
    at /usr/src/debug/libx11/libX11-1.8.9/src/xcb_io.c:174
#1  0x00007ffff7103343 in _XReply (dpy=0x55555cd1fde0, rep=0x7fffffffdb00, extra=0, discard=0)
    at /usr/src/debug/libx11/libX11-1.8.9/src/xcb_io.c:736
#2  0x00007ffff70e40f4 in XGetWindowProperty (dpy=0x55555cd1fde0, w=25165826, property=372, offset=0, 
    length=32, delete=<optimized out>, req_type=4, actual_type=0x7fffffffdbb8, actual_format=0x7fffffffdbb4, 
    nitems=0x7fffffffdbc0, bytesafter=0x7fffffffdbc8, prop=0x7fffffffdbd0)
    at /usr/src/debug/libx11/libX11-1.8.9/src/GetProp.c:69
#3  0x0000555555af1360 in DisplayServerX11::_window_minimize_check (this=this@entry=0x55555ccfc9f0, 
    p_window=p_window@entry=0) at platform/linuxbsd/x11/display_server_x11.cpp:2375
#4  0x0000555555af167f in DisplayServerX11::window_get_mode (this=0x55555ccfc9f0, p_window=0)
    at platform/linuxbsd/x11/display_server_x11.cpp:2705
#5  0x0000555555aeba48 in DisplayServerX11::can_any_window_draw (this=0x55555ccfc9f0)
    at platform/linuxbsd/x11/display_server_x11.cpp:2912
#6  0x0000555555b45426 in Main::iteration () at main/main.cpp:3685
#7  0x0000555555ad7311 in OS_LinuxBSD::run (this=this@entry=0x7fffffffddb0)
    at platform/linuxbsd/os_linuxbsd.cpp:958
#8  0x0000555555ac5176 in main (argc=<optimized out>, argv=0x7fffffffe398)
    at platform/linuxbsd/godot_linuxbsd.cpp:74

When continuing the program after the breakpoint is hit, it immediately crashes apologetically.

The API function XGetWindowProperty called from Godot does lock the mutex, but _XReply transiently unlocks it for a while. And apparently, by the time dequeue_pending_request is called here, the mutex is somehow not locked.

This is as far as I got for today. I tried setting more breakpoints in _XReply to find out where the lock is lost, but the breakpoints all end up a the top of the function for some reason, and also seem to interfere with my ability to trigger the crash. Stupid Heisenbug.

@Some1and2-XC
Copy link

Hey, this seems to still be an issue running on Ubuntu 24.04 running x11 and KDE.

Godot Engine v4.3.stable.official.77dcf97d8 - https://godotengine.org
Vulkan 1.3.274 - Forward+ - Using Device #0: Intel - Intel(R) UHD Graphics (ICL GT1)

[xcb] Unknown request in queue while dequeuing
[xcb] You called XInitThreads, this is not your fault
[xcb] Aborting, sorry about that.
project.x86_64: ../../src/xcb_io.c:175: dequeue_pending_request: Assertion `!xcb_xlib_unknown_req_in_deq' failed.
Aborted (core dumped)

I wonder if this is an intel integrated graphics thing? Also should I try and update something to fix this?

@novalis
Copy link
Contributor

novalis commented Nov 17, 2024 via email

@Invium-GH
Copy link

Summarizing the reports above:

  • libx11 1.8.2 - OK (unless patched)
  • libx11 1.8.3 - broken
  • libx11 1.8.4 - mixed
  • libx11 1.8.5 - OK (but see below)
  • libx11 1.8.6 - broken
  • libx11 1.8.7 - broken
  • libx11 1.8.8 - unknown
  • libx11 1.8.9 - broken

1.8.10 - Broken.

I have been struggling with this very same crash for some time now in my personal project and have also resorted to trying to create an MRP to force the crash. The intent being that if I could find a force for the crash, Godot's developers could maybe use that information to find a workaround.

So far, no luck in forcing it. Will continue to attempt it though.

Also, so far I have been unable to eradicate it completely from my project with workarounds. Although I do think I have managed to reduce the occurrence rate. But this could just be a feeling. As the crash is very random in nature. Very hard to predict and to reproduce.

@hpvb hpvb self-assigned this Jan 14, 2025
@hpvb hpvb modified the milestones: 4.x, 4.4 Jan 14, 2025
@hpvb hpvb moved this from Unassessed to Release Blocker in 4.x Release Blockers Jan 14, 2025
@Invium-GH
Copy link

One thing I have noticed while struggling with finding a workaround or MRP to force the crash is that I have yet to see it happen with the editor or the running project being in focus.

I have seen both the running project as well as the editor itself (without the project running) crash as a result of this issue but never when in focus. Always when focus was on a different task. Most recent crash I've had was with Godot being open, my project not running and me just working on some C# code for the project, using an external editor (VSCode).

And any time my project itself runs into this crash it is with the project not being in focus.

Maybe this could help to guide someone more familiar with the interaction between Godot and the crashing library towards a workaround.

For example, is there something different between how Godot updates/draws windows which are in focus compared to how it updates/draws windows which are not in focus?

@Calinou
Copy link
Member

Calinou commented Jan 25, 2025

For example, is there something different between how Godot updates/draws windows which are in focus compared to how it updates/draws windows which are not in focus?

The editor is capped to 10 FPS by default when unfocused. You can disable this behavior by enabling Update Continuously in the Editor Settings. (This disables the effect of the Low Processor Mode Sleep Usec and Unfocused Low Processor Mode Sleep Usec editor settings.)

@Invium-GH
Copy link

You can disable this behavior by enabling Update Continuously in the Editor Settings.

I haven't seen the editor crash since enabling this. It's been a total uptime of at least a few dozen hours by now. So, it would appear there is something to the crash being more likely to happen with infrequently updated windows or windows with a lower frame rate. But, this is hardly a viable workaround. As per the tooltip of this option, it is more to be used for troubleshooting purpses.

Incidentally, is there something else about projects run in the background versus in focus? Like, some internal optimization the engine performs to save on CPU/GPU cycles for unfocused windows? So, not related to the editor but related to the engine itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Release Blocker
Development

No branches or pull requests