Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

X11 crash: [xcb] Unknown sequence number while processing queue #458

Closed
mitchmindtree opened this issue Apr 11, 2018 · 23 comments
Closed

X11 crash: [xcb] Unknown sequence number while processing queue #458

mitchmindtree opened this issue Apr 11, 2018 · 23 comments
Labels
B - bug Dang, that shouldn't have happened C - needs investigation Issue must be confirmed and researched D - hard Likely harder than most tasks here DS - x11 H - help wanted Someone please save us P - low Nice to have

Comments

@mitchmindtree
Copy link
Contributor

The following error occurred at startup when running a native GUI program I'm working on. I've run a lot of GUI windows on X11 in the past year and I've never seen this error before, so thought I'd post it:

    Finished release [optimized + debuginfo] target(s) in 18.95 secs
     Running `target/release/audio_server`
[xcb] Unknown sequence number while processing queue
[xcb] Most likely this is a multi-threaded client and XInitThreads has not been called
[xcb] Aborting, sorry about that.
audio_server: xcb_io.c:259: poll_for_event: Assertion `!xcb_xlib_threads_sequence_lost' failed.

rustc 1.26.0-nightly (9c9424de5 2018-03-27)
winit 0.12.0

It has only occurred once and I'm not entirely sure how to recreate it yet.

Curiously, the error suggests calling XInitThreads however it looks to me like we already do this.

The only thing I can think to note is that I do use an EventsLoopProxy to wakeup the main (GUI) thread from a separate audio monitoring thread. There's a chance that EventsLoopProxy::wakeup gets called prior to the a call to EventsLoop::run_forever. I have no idea if this has anything to do with the issue yet, just a thought.

@mitchmindtree mitchmindtree added B - bug Dang, that shouldn't have happened DS - x11 labels Apr 11, 2018
@azriel91
Copy link
Contributor

Hiya, I have a reliable way of reproducing it (albeit I haven't made a minimal example).

  • Create an app that has multiple threads, and make each thread spawn a Window.
  • Start a new shell session, and run the app.
  • It should fail with the error.

Subsequent runs of the app in the same shell session work.

So, I hit this every time I run my project's automated tests, from a new shell session. In my setup, I have unit tests within the same module spawning their own Windows. If I run my tests with cargo test -- --test-threads=1, then it works.

Before upgrading to winit 0.12 I just had those tests under an #[ignore], the version I was using before was before #416 was merged (0.10?).

@francesca64
Copy link
Member

@azriel91 thanks for the info. My winit test app has one thread per window, but I can't reproduce following those instructions. I'm not using EventsLoopProxy though; are you?

(Also, this is sort of embarrassing, but I don't really understand what EventsLoopProxy is used for. @mitchmindtree, can you help fill me in on that?)

@azriel91
Copy link
Contributor

azriel91 commented Apr 26, 2018

I'm not using EventsLoopProxy either.
I managed to make a minimal example that does reproduce it (not always, but easily when run in a loop):
https://github.com/azriel91/multithread_window

The assumption about "open a new shell" is incorrect, the issue can be reproduced if you launch the exe enough times. Here's some output using winit 0.13:

Spawned threads.
[xcb] Unknown sequence number while appending request
[xcb] Most likely this is a multi-threaded client and XInitThreads has not been called
[xcb] Aborting, sorry about that.
multithread_window: ../../src/xcb_io.c:147: append_pending_request: Assertion `!xcb_xlib_unknown_seq_number' failed.
Spawned threads.
Waited for 100 ms.
Spawned threads.
Waited for 100 ms.
Spawned threads.
XIO:  fatal IO error 11 (Resource temporarily unavailable) on X server ":1"
      after 175 requests (174 known processed) with 0 events remaining.
[xcb] Unknown request in queue while dequeuing
[xcb] Most likely this is a multi-threaded client and XInitThreads has not been called
[xcb] Aborting, sorry about that.
Spawned threads.
[xcb] Unknown sequence number while appending request
[xcb] Most likely this is a multi-threaded client and XInitThreads has not been called
[xcb] Aborting, sorry about that.
multithread_window: ../../src/xcb_io.c:147: append_pending_request: Assertion `!xcb_xlib_unknown_seq_number' failed.

I also managed to get another error, but maybe defer it to a different issue:

Spawned threads.
thread '<unnamed>' panicked at 'Failed to open input method: PotentialInputMethods {
    xmodifiers: Some(
        PotentialInputMethod {
            name: "@im=ibus",
            successful: Some(
                false
            )
        }
    ),
    fallbacks: [
        PotentialInputMethod {
            name: "@im=local",
            successful: Some(
                false
            )
        },
        PotentialInputMethod {
            name: "@im=",
            successful: Some(
                false
            )
        }
    ],
    _xim_servers: Ok(
        [
            "@im=ibus"
        ]
    )
}', /home/azriel/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/winit-0.13.0/src/platform/linux/x11/mod.rs:82:17
note: Run with `RUST_BACKTRACE=1` for a backtrace.

Running with RUST_BACKTRACE=1 makes this error take a while to happen, but when it does, it gives this:

note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
stack backtrace:
   0: std::sys::unix::backtrace::tracing::imp::unwind_backtrace
             at libstd/sys/unix/backtrace/tracing/gcc_s.rs:49
   1: std::sys_common::backtrace::_print
             at libstd/sys_common/backtrace.rs:71
   2: std::panicking::default_hook::{{closure}}
             at libstd/sys_common/backtrace.rs:59
             at libstd/panicking.rs:380
   3: std::panicking::default_hook
             at libstd/panicking.rs:396
   4: std::panicking::rust_panic_with_hook
             at libstd/panicking.rs:576
   5: std::panicking::begin_panic
             at /checkout/src/libstd/panicking.rs:537
   6: winit::platform::platform::x11::EventsLoop::new
             at /home/azriel/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/winit-0.13.0/src/platform/linux/x11/mod.rs:82
   7: winit::platform::platform::EventsLoop::new_x11

@francesca64
Copy link
Member

Ah, awesome, that works! I get a mixture of the XIO error, the XCB error, the XIM error, and the occasional segfault.

I also managed to get another error, but maybe defer it to a different issue:

I think that error would probably have the same general root cause here.

This doesn't seem like it will be easy to track down. I figured the segfaults would be the easiest to investigate (and with some luck, would be related), so I ran with valgrind until I eventually got this: https://gist.github.com/francesca64/5e8512e58d5f728d429bdd28003b4c1a

Honestly, it's also possible there's nothing we can do about this. Fundamentally speaking, X11 isn't thread-safe, and we could be hitting some race condition somewhere.

Anyway, I looked at your example and noticed something. You create a new EventsLoop per thread, whereas my program uses one EventsLoop, then sends windows to their own threads and uses channels to forward events. With that approach, I've never had these issues. You also need to only have one EventsLoop if you want your application to work on macOS, where the EventsLoop needs to live on the main thread.

That said, doing things the way I suggest leads to any X11 call made from that thread blocking until you receive another event.

@francesca64
Copy link
Member

Okay, I got the XCB error once while running Alacritty today. I don't know much about Alacritty's internals, but it has me wondering if this can happen in every winit application.

@azriel91
Copy link
Contributor

my program uses one EventsLoop, then sends windows to their own threads and uses channels to forward events. ... You also need to only have one EventsLoop if you want your application to work on macOS, where the EventsLoop needs to live on the main thread.

Oh I see, good to know, thanks! 🙂

I don't have control over the thread creation in my case (cargo test), but perhaps being clever with lazy_static and a Mutex would enable that level of automated testing.

@francesca64 francesca64 added H - help wanted Someone please save us C - needs investigation Issue must be confirmed and researched D - hard Likely harder than most tasks here P - low Nice to have labels May 6, 2018
@MrMinimal
Copy link

I think I ran into the same problem using the amethyst game engine.
Also I attached a stacktrace of my coredump in the associated issue.
If you want me to provide a coredump to debug I am happy to help!

@francesca64
Copy link
Member

@MrMinimal thanks. However, if you want this fixed soon, someone other than me will likely have to work on this.

  • Is anything different if you run winit from master?
  • Does this only happen if you open winit applications in quick succession?

@jwilm
Copy link
Contributor

jwilm commented May 14, 2018

(Also, this is sort of embarrassing, but I don't really understand what EventsLoopProxy is used for. @mitchmindtree, can you help fill me in on that?)

Alacritty uses EventsLoopProxy to wake up the event loop / render thread when the terminal state is updated from our I/O thread. This is only necessary in our case since the render thread is also the events loop thread, and there's no other way to wake-up the event loop from another thread without EventsLoopProxy.

@francesca64
Copy link
Member

@jwilm thanks, though fortunately I already got clarification on that. #462 (comment)

@MrMinimal
Copy link

@francesca64 Running the fullscreen example from master, I can't seem to reproduce the issue. Is there a certain example which would be prone to producing this one?

@francesca64
Copy link
Member

@MrMinimal did you try here? https://github.com/azriel91/multithread_window (you'll need to change the Cargo.toml)

@MrMinimal
Copy link

@francesca64 Thanks for the fast responses! Just tried the link you provided but could not reproduce it with that example. The amethyst examples using winit still reliably reproduce it.

@francesca64
Copy link
Member

Maybe this was fixed by #491? I haven't tested it since then. I'll check when I get a chance.

Amethyst is still on winit 0.13.1, which doesn't include that PR.

@MrMinimal
Copy link

@francesca64 Oh that might explain it! Thank you for the input, I have not too much experience with amethyst. I'd be interested in your findings, tell me if I can help in any way!

@jwilm
Copy link
Contributor

jwilm commented May 16, 2018

@francesca64

@jwilm thanks, though fortunately I already got clarification on that.

My bad; sorry! I searched this issue as a precaution but my search ended there.

@francesca64
Copy link
Member

@MrMinimal I can still reproduce this easily while true; do target/debug/multithread_window; done. Was this something that happens to you frequently on older versions? It's supposed to be extremely rare in normal usage (at least, that's why I have the priority set to low).

@jwilm there's no need to apologize. There's an awful lot to keep track of.

@MrMinimal
Copy link

@francesca64 You are right, I can still reproduce it that way, but it happens once in 60 runs. With amethyst it happens more like once in 5 runs. So yes, it happens on older versions more frequently.

@francesca64
Copy link
Member

@MrMinimal I've only ever encountered this once in normal usage, which is fortunate, because I'm not optimistic about being able to fix this soon. Also, this is off-topic, but I saw you over on gilrs (I watch that repo) asking about DirectInput. Do you want to implement that yourself? I was planning on doing it when I get a chance; it looks straightforward judging by the SDL source that was linked to.

@MrMinimal
Copy link

@francesca64 offtopic answer:
I have not yet had the time to look into it, currently amethyst has an issue wich tracks available input solutions and their advantages/disadvantages amethyst/amethyst#414.
I also found a repo which does everything we want https://github.com/Jonesey13/multiinput-rust but only for windows (Raw Input seems superior to DirectInput in terms of supporting more hardware). So I was thinking about merging multiinput-rust's features into gilrs but don't have the time at all.

@francesca64
Copy link
Member

francesca64 commented Jun 6, 2018

More people keep encountering this, so I thought it would be a good idea to actually try to fix it!

Running off of this branch #554, multithread_window generates no errors for me.

@b-r-u
Copy link
Contributor

b-r-u commented Jun 6, 2018

@francesca64 Your branch works for me as well. This is really great! It must have been frustrating trying to find the culprit here.

@francesca64
Copy link
Member

It was actually surprisingly okay! I really did expect to sink my whole day into it, though.

What immediately caught my attention was the fact that a single application-global XConnection was created via lazy_static. winit is big enough that there are still parts of the codebase I haven't really seen, and seeing this design choice surprised me.

I thought to myself "what happens if two threads try to initialize that concurrently?" so I switched it to thread_local. I also tried using Once instead, but that didn't help.

That only fixed the XCB and XIO errors, but not the XIM errors and segfaults. Introducing a global lock guarding XOpenIM was an easy guess, since we used to have one before my XIM rewrite. I removed it, since it didn't seem to solve any of the myriad threadsafety problems XIM has, but I guess I never considered the case of multiple concurrent event loops. Actually, I should probably guard all XIM calls; maybe doing that would finally fix this other weird issue: #347 (comment)

tmfink pushed a commit to tmfink/winit that referenced this issue Jan 5, 2022
…ples/canvas_webgl_minimal/www/ssri-6.0.2, r=jdm

Bump ssri from 6.0.1 to 6.0.2 in /examples/canvas_webgl_minimal/www

Bumps [ssri](https://github.com/npm/ssri) from 6.0.1 to 6.0.2.
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a href="https://github.com/npm/ssri/blob/v6.0.2/CHANGELOG.md">ssri's changelog</a>.</em></p>
<blockquote>
<h2><a href="https://github.com/zkat/ssri/compare/v6.0.1...v6.0.2">6.0.2</a> (2021-04-07)</h2>
<h3>Bug Fixes</h3>
<ul>
<li>backport regex change from 8.0.1 (<a href="https://github.com/zkat/ssri/commit/b30dfdb">b30dfdb</a>), closes <a href="https://github-redirect.dependabot.com/zkat/ssri/issues/19">#19</a></li>
</ul>
<p><!-- raw HTML omitted --><!-- raw HTML omitted --></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a href="https://github.com/npm/ssri/commit/b7c8c7c61db89aeb9fbf7596c0ef17071bc216ef"><code>b7c8c7c</code></a> chore(release): 6.0.2</li>
<li><a href="https://github.com/npm/ssri/commit/b30dfdb00bb94ddc49a25a85a18fb27afafdfbb1"><code>b30dfdb</code></a> fix: backport regex change from 8.0.1</li>
<li>See full diff in <a href="https://github.com/npm/ssri/compare/v6.0.1...v6.0.2">compare view</a></li>
</ul>
</details>
<details>
<summary>Maintainer changes</summary>
<p>This version was pushed to npm by <a href="https://www.npmjs.com/~nlf">nlf</a>, a new releaser for ssri since your current version.</p>
</details>
<br />

[![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=ssri&package-manager=npm_and_yarn&previous-version=6.0.1&new-version=6.0.2)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
- `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language
- `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language
- `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language
- `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/servo/pathfinder/network/alerts).

</details>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
B - bug Dang, that shouldn't have happened C - needs investigation Issue must be confirmed and researched D - hard Likely harder than most tasks here DS - x11 H - help wanted Someone please save us P - low Nice to have
Development

No branches or pull requests

6 participants