From 2f917fda238ffd695d25e5a5fb7133eae355aba5 Mon Sep 17 00:00:00 2001 From: Yuxuan Shui Date: Sat, 12 Oct 2024 17:56:03 +0100 Subject: [PATCH] core: fix poll for event loop (i hope) This partially reverts c57b705ec438a205d05d7c6f200770839058a225, the previous fix for #1345 The previous fix falsely assumed if `x_poll_for_event` returned nothing, it must have not read from the X connection. This is just not true. It worked, probably because it's impossible to have vblank events close together (they are only once per frame). But that's an assumption we probably shouldn't make. _Separately_, we have another problem. We assumed if `x_poll_for_event` returned nothing, we don't need to flush again. This is not true either. There could be pending requests being completed, whose callback function might have sent new requests. This, for example, causes `picom-inspect` to hang, because some requests are stuck in the outgoing buffer. The explanation for the fix is going to be long, because this kind of problems are never simple. First of all, we need a version of `x_poll_for_event` that actually guarantees to not read from X. The current version of `x_poll_for_event` is already extremely complex, I don't want to pile more complexity on top of it. So instead we are now using a different approach, one some might consider a ugly hack. The complexity of `x_poll_for_event` all stems from the lack of `xcb_poll_for_queued_{reply,special_event}` API, so we had to use complex to merge message from different streams (events, special events, replies, and errors) all the while trying our best (and fail) to handle all messages in the xcb buffer before going to sleep. There is a `xcb_poll_for_queued_event` which in theory can be used for this and will make things a lot simpler, the problem is we don't control events, so if there is a large gap between event arrivals, picom would just be sitting there doing nothing. And that's exactly what we do in this commit, that is, controlling events. Every time we wait for replies/errors for some requests, we send a request that are guaranteed to _fail_. This is because unchecked errors will be returned by `xcb_poll_for_queued_event` as an event. This way, we can be sure an event will be received in a bounded amount of time, so we won't hang. This vastly simplifies the logic in `x_poll_for_event`, and made adding a no-read version of it trivial. Now we have that, some care need to be taken to make sure that we _do_ read from X sometimes, otherwise we will go to sleep without reading anything, and be woken up again immediately by the file descriptor. This will result in an infinite loop. This has some ripple effects, for example, now we queue redraw whenever the wm tree changes; before, redraws were only queued when the wm tree becomes consistent. Even with this, we still need the `while(true)` loop in `handle_x_events`, this is because of the other problem we mentioned at the beginning - flushing. The way we fix the flushing problem is by tracking the completion of pending requests - if any requests were completed, we flush conservatively (meaning even when not strictly necessary) - simple enough. But flushing could read (yes, read) from X too! So whenever we flush, we need to call `x_poll_for_event` again, hence the while loop. Fixes: #1345 Signed-off-by: Yuxuan Shui --- src/picom.c | 57 +++++++------- src/vblank.c | 14 ++-- src/vblank.h | 8 +- src/x.c | 208 ++++++++++++++++----------------------------------- src/x.h | 44 +++++++---- 5 files changed, 128 insertions(+), 203 deletions(-) diff --git a/src/picom.c b/src/picom.c index 1056234b13..86dd117ae5 100644 --- a/src/picom.c +++ b/src/picom.c @@ -1449,8 +1449,7 @@ static void unredirect(session_t *ps) { /// keeps an internal queue of events, so we have to be 100% sure no events are /// left in that queue before we go to sleep. static void handle_x_events(struct session *ps) { - bool wm_was_consistent = wm_is_consistent(ps->wm); - + uint32_t latest_completed = ps->c.latest_completed_request; while (true) { // Flush because if we go into sleep when there is still requests // in the outgoing buffer, they will not be sent for an indefinite @@ -1460,47 +1459,44 @@ static void handle_x_events(struct session *ps) { // Also note, `xcb_flush`/`XFlush` may _read_ more events from the server // (yes, this is ridiculous, I know). But since we are polling for events // in a loop, this should be fine - if we read events here, they will be - // handled below; and if some requests is sent later in this loop, which - // means some events must have been received, we will loop once more and - // get here to flush them. + // handled below; and if some requests is sent later in this loop, we will + // set `needs_flush` and loop once more and get here to flush them. XFlush(ps->c.dpy); xcb_flush(ps->c.c); - // We have to check for vblank events (i.e. special xcb events) and normal - // events in a loop. This is because both `xcb_poll_for_event` and - // `xcb_poll_for_special_event` will read data from the X connection and - // put it in a buffer. So whichever one we call last, say - // `xcb_poll_for_special_event`, will read data into the buffer that might - // contain events that `xcb_poll_for_event` should handle, and vice versa. - // This causes us to go into sleep with events in the buffer. - // - // We have to keep calling both of them until neither of them return any - // events. - bool has_events = false; + // If we send any new requests, we should loop again to flush them. There + // is no direct way to do this from xcb. So if we called `ev_handle`, or + // if any pending requests were completed, we conservatively loop again. + bool needs_flush = false; if (ps->vblank_scheduler) { - has_events = vblank_handle_x_events(ps->vblank_scheduler) == - VBLANK_HANDLE_X_EVENTS_OK; + vblank_handle_x_events(ps->vblank_scheduler); } xcb_generic_event_t *ev; - while ((ev = x_poll_for_event(&ps->c))) { - has_events = true; + while ((ev = x_poll_for_event(&ps->c, true))) { ev_handle(ps, (xcb_generic_event_t *)ev); + needs_flush = true; free(ev); }; - if (!has_events) { + if (ps->c.latest_completed_request != latest_completed) { + needs_flush = true; + latest_completed = ps->c.latest_completed_request; + } + + if (!needs_flush) { break; } } + int err = xcb_connection_has_error(ps->c.c); if (err) { log_fatal("X11 server connection broke (error %d)", err); exit(1); } - if (wm_is_consistent(ps->wm) != wm_was_consistent && !wm_was_consistent) { - log_debug("Window tree has just become consistent, queueing redraw."); + if (wm_has_tree_changes(ps->wm)) { + log_debug("Window tree changed, queueing redraw."); ps->pending_updates = true; queue_redraw(ps); } @@ -1966,9 +1962,18 @@ static void draw_callback(EV_P_ ev_timer *w, int revents) { } } -static void x_event_callback(EV_P attr_unused, ev_io * /*w*/, int revents attr_unused) { - // This function is intentionally left blank, events are actually read and handled - // in the ev_prepare listener. +static void x_event_callback(EV_P attr_unused, ev_io *w, int revents attr_unused) { + // Make sure the X connection is being read from at least once every time + // we woke up because of readability of the X connection. + // + // `handle_x_events` is not guaranteed to read from X, so if we don't do + // it here we could dead lock. + struct session *ps = session_ptr(w, xiow); + auto ev = x_poll_for_event(&ps->c, false); + if (ev) { + ev_handle(ps, (xcb_generic_event_t *)ev); + free(ev); + } } static void config_file_change_cb(void *_ps) { diff --git a/src/vblank.c b/src/vblank.c index e63b0a51cb..057f7d4058 100644 --- a/src/vblank.c +++ b/src/vblank.c @@ -69,7 +69,7 @@ struct vblank_scheduler_ops { bool (*init)(struct vblank_scheduler *self); void (*deinit)(struct vblank_scheduler *self); bool (*schedule)(struct vblank_scheduler *self); - enum vblank_handle_x_events_result (*handle_x_events)(struct vblank_scheduler *self); + bool (*handle_x_events)(struct vblank_scheduler *self); }; static void @@ -444,14 +444,10 @@ static void handle_present_complete_notify(struct present_vblank_scheduler *self ev_timer_start(self->base.loop, &self->callback_timer); } -static enum vblank_handle_x_events_result -handle_present_events(struct vblank_scheduler *base) { +static bool handle_present_events(struct vblank_scheduler *base) { auto self = (struct present_vblank_scheduler *)base; xcb_present_generic_event_t *ev; - enum vblank_handle_x_events_result result = VBLANK_HANDLE_X_EVENTS_NO_EVENTS; while ((ev = (void *)xcb_poll_for_special_event(base->c->c, self->event))) { - result = VBLANK_HANDLE_X_EVENTS_OK; - if (ev->event != self->event_id) { // This event doesn't have the right event context, it's not meant // for us. @@ -464,7 +460,7 @@ handle_present_events(struct vblank_scheduler *base) { next: free(ev); } - return result; + return true; } static const struct vblank_scheduler_ops vblank_scheduler_ops[LAST_VBLANK_SCHEDULER] = { @@ -575,11 +571,11 @@ vblank_scheduler_new(struct ev_loop *loop, struct x_connection *c, xcb_window_t return self; } -enum vblank_handle_x_events_result vblank_handle_x_events(struct vblank_scheduler *self) { +bool vblank_handle_x_events(struct vblank_scheduler *self) { assert(self->type < LAST_VBLANK_SCHEDULER); auto fn = vblank_scheduler_ops[self->type].handle_x_events; if (fn != NULL) { return fn(self); } - return VBLANK_HANDLE_X_EVENTS_NO_EVENTS; + return true; } diff --git a/src/vblank.h b/src/vblank.h index 83bda69daa..a1916dc405 100644 --- a/src/vblank.h +++ b/src/vblank.h @@ -29,12 +29,6 @@ enum vblank_callback_action { VBLANK_CALLBACK_DONE, }; -enum vblank_handle_x_events_result { - VBLANK_HANDLE_X_EVENTS_OK, - VBLANK_HANDLE_X_EVENTS_ERROR, - VBLANK_HANDLE_X_EVENTS_NO_EVENTS, -}; - typedef enum vblank_callback_action (*vblank_callback_t)(struct vblank_event *event, void *user_data); @@ -53,4 +47,4 @@ vblank_scheduler_new(struct ev_loop *loop, struct x_connection *c, xcb_window_t enum vblank_scheduler_type type, bool use_realtime_scheduling); void vblank_scheduler_free(struct vblank_scheduler *); -enum vblank_handle_x_events_result vblank_handle_x_events(struct vblank_scheduler *self); +bool vblank_handle_x_events(struct vblank_scheduler *self); diff --git a/src/x.c b/src/x.c index 74417f0c8e..9a51d06e36 100644 --- a/src/x.c +++ b/src/x.c @@ -166,6 +166,12 @@ void x_print_error_impl(unsigned long serial, uint8_t major, uint16_t minor, /// This function logs X errors, or aborts the program based on severity of the error. static void x_handle_error(struct x_connection *c, xcb_generic_error_t *ev) { x_discard_pending_errors(c, ev->full_sequence); + if (c->event_sync_sent && ev->full_sequence == c->event_sync) { + // This is the error for our event sync request we have been + // expecting. + c->event_sync_sent = false; + return; + } struct pending_x_error *first_error_action = NULL; if (!list_is_empty(&c->pending_x_errors)) { first_error_action = @@ -232,7 +238,7 @@ void x_connection_init(struct x_connection *c, Display *dpy) { c->screen = DefaultScreen(dpy); c->screen_info = xcb_aux_get_screen(c->c, c->screen); - c->message_on_hold = NULL; + c->event_sync_sent = false; // Do a round trip to fetch the current sequence number auto cookie = xcb_get_input_focus(c->c); @@ -1017,166 +1023,78 @@ static const xcb_raw_generic_event_t no_reply_success = {.response_type = 1}; /// Read the X connection once, and return the first event or unchecked error. If any /// replies to pending X requests are read, they will be processed and callbacks will be /// called. -static const xcb_raw_generic_event_t * -x_poll_for_event_impl(struct x_connection *c, bool *retry) { - // This whole thing, this whole complexity arises from libxcb's idiotic API. - // What we need is a call to give us the next message from X, regardless if - // it's a reply, an error, or an event. But what we get is two different calls: - // `xcb_poll_for_event` and `xcb_poll_for_reply`. And there is this weird - // asymmetry: `xcb_poll_for_event` has a version that doesn't read from X, but - // `xcb_poll_for_reply` doesn't. - // - // This whole thing we are doing here, is trying to emulate the desired behavior - // with what we got. - if (c->first_request_with_reply == NULL) { - // No reply to wait for, just poll for events. - BUG_ON(!list_is_empty(&c->pending_x_requests)); - return x_ingest_event(c, xcb_poll_for_event(c->c)); +static xcb_raw_generic_event_t *x_poll_for_event_impl(struct x_connection *c, bool queued) { + auto e = queued ? xcb_poll_for_queued_event(c->c) : xcb_poll_for_event(c->c); + if (e == NULL) { + return NULL; } - // Now we need to first check if a reply to `first_request_with_reply` is - // available. - if (c->message_on_hold == NULL) { - xcb_generic_error_t *err = NULL; - auto has_reply = xcb_poll_for_reply(c->c, c->first_request_with_reply->sequence, - (void *)&c->message_on_hold, &err); - if (has_reply != 0 && c->message_on_hold == NULL) { - c->message_on_hold = (xcb_raw_generic_event_t *)err; + auto seq = x_widen_sequence(c, e->full_sequence); + list_foreach_safe(struct x_async_request_base, i, &c->pending_x_requests, siblings) { + auto head_seq = x_widen_sequence(c, i->sequence); + if (head_seq > seq) { + break; } - c->sequence_on_hold = c->first_request_with_reply->sequence; - } - - // Even if we don't get a reply for `first_request_with_reply`, some of the - // `no_reply` requests might have completed. We will see if we got an event, and - // deduce that information from that. - auto event = xcb_poll_for_queued_event(c->c); - - // As long as we have read one reply, we could have read an indefinite number of - // replies, so after we returned we might need to retry this function. - *retry = c->message_on_hold != NULL; - - while (!list_is_empty(&c->pending_x_requests) && - (event != NULL || c->message_on_hold != NULL)) { - auto head = list_entry(c->pending_x_requests.next, - struct x_async_request_base, siblings); - auto head_seq = x_widen_sequence(c, head->sequence); - auto event_seq = UINT64_MAX; - if (event != NULL) { - event_seq = x_widen_sequence(c, event->full_sequence); - BUG_ON(event->response_type == 1); - if (head_seq > event_seq) { - // The event comes before the first pending request, we - // can return it first. - break; - } + if (head_seq == seq && e->response_type == 0) { + // Error replies are handled in `x_poll_for_event`. + break; } - - // event == NULL || head_seq <= event_seq - // If event == NULL, we have message_on_hold != NULL. And message_on_hold - // has a sequence number equals to first_request_with_reply, which must be - // greater or equal to head_seq; If event != NULL, we have head_seq <= - // event_seq. Either case indicates `head` has completed, so we can remove - // it from the list. - list_remove(&head->siblings); - if (c->first_request_with_reply == head) { - BUG_ON(head->no_reply); - c->first_request_with_reply = NULL; - list_foreach(struct x_async_request_base, i, - &c->pending_x_requests, siblings) { - if (!i->no_reply) { - c->first_request_with_reply = i; - break; - } + auto reply_or_error = &no_reply_success; + if (!i->no_reply) { + // We have received something with sequence number `seq >= + // head_seq`, so we are sure that a reply for `i` is available in + // xcb's buffer, so we can safely call `xcb_poll_for_reply` + // without reading from X. + xcb_generic_error_t *err = NULL; + auto has_reply = xcb_poll_for_reply( + c->c, i->sequence, (void **)&reply_or_error, &err); + BUG_ON(has_reply == 0); + if (reply_or_error == NULL) { + reply_or_error = (xcb_raw_generic_event_t *)err; } } - x_discard_pending_errors(c, head_seq); - c->last_sequence = head->sequence; - - if (event != NULL) { - if (event->response_type == 0 && head_seq == event_seq) { - // `event` is an error response to `head`. `head` must be - // `no_reply`, otherwise its error will be returned by - // `xcb_poll_for_reply`. - BUG_ON(!head->no_reply); - head->callback(c, head, (xcb_raw_generic_event_t *)event); - free(event); - - event = xcb_poll_for_queued_event(c->c); - continue; - } - - // Here, we have 2 cases: - // a. event is an error && head_seq < event_seq - // b. event is a true event && head_seq <= event_seq - // In either case, we know `head` has completed. If it has a - // reply, the reply should be in xcb's queue. So we can call - // `xcb_poll_for_reply` safely without reading from X. - if (head->no_reply) { - head->callback(c, head, &no_reply_success); - continue; - } - if (c->message_on_hold == NULL) { - xcb_generic_error_t *err = NULL; - BUG_ON(xcb_poll_for_reply(c->c, head->sequence, - (void *)&c->message_on_hold, - NULL) == 0); - if (c->message_on_hold == NULL) { - c->message_on_hold = (xcb_raw_generic_event_t *)err; - } - c->sequence_on_hold = head->sequence; - } - BUG_ON(c->sequence_on_hold != head->sequence); - head->callback(c, head, c->message_on_hold); - free(c->message_on_hold); - c->message_on_hold = NULL; - } - // event == NULL => c->message_on_hold != NULL - else if (x_widen_sequence(c, c->sequence_on_hold) > head_seq) { - BUG_ON(!head->no_reply); - head->callback(c, head, &no_reply_success); - } else { - BUG_ON(c->sequence_on_hold != head->sequence); - BUG_ON(head->no_reply); - head->callback(c, head, c->message_on_hold); - free(c->message_on_hold); - c->message_on_hold = NULL; + c->latest_completed_request = i->sequence; + list_remove(&i->siblings); + i->callback(c, i, reply_or_error); + if (reply_or_error != &no_reply_success) { + free((void *)reply_or_error); } } - - return x_ingest_event(c, event); -} - -static void -x_dummy_async_callback(struct x_connection * /*c*/, struct x_async_request_base *req_base, - const xcb_raw_generic_event_t * /*reply_or_error*/) { - free(req_base); + return x_ingest_event(c, e); } -xcb_generic_event_t *x_poll_for_event(struct x_connection *c) { - const xcb_raw_generic_event_t *ret = NULL; +xcb_generic_event_t *x_poll_for_event(struct x_connection *c, bool queued) { + xcb_raw_generic_event_t *ret = NULL; while (true) { - if (c->first_request_with_reply == NULL && - !list_is_empty(&c->pending_x_requests)) { - // All requests we are waiting for are no_reply. We would have no - // idea if any of them completed, until a subsequent event is - // received, which can take an indefinite amount of time. So we - // insert a GetInputFocus request to ensure we get a reply. - auto req = ccalloc(1, struct x_async_request_base); - req->sequence = xcb_get_input_focus(c->c).sequence; - req->callback = x_dummy_async_callback; - x_await_request(c, req); - } - bool retry = false; - ret = x_poll_for_event_impl(c, &retry); - if (ret == NULL && !list_is_empty(&c->pending_x_requests) && retry) { - continue; + if (!list_is_empty(&c->pending_x_requests) && !c->event_sync_sent) { + // Send a request that is guaranteed to error, see comments on + // `event_sync` for why. + c->event_sync = xcb_free_pixmap(c->c, XCB_NONE).sequence; + c->event_sync_sent = true; } + ret = x_poll_for_event_impl(c, queued); + if (ret == NULL || ret->response_type != 0) { break; } - x_handle_error(c, (xcb_generic_error_t *)ret); - free((void *)ret); + // We received an error, handle it and try again to see if there are real + // events. + struct x_async_request_base *head = NULL; + xcb_generic_error_t *error = (xcb_generic_error_t *)ret; + if (!list_is_empty(&c->pending_x_requests)) { + head = list_entry(c->pending_x_requests.next, + struct x_async_request_base, siblings); + } + if (head != NULL && error->full_sequence == head->sequence) { + // This is an error response to the head of pending requests. + c->latest_completed_request = head->sequence; + list_remove(&head->siblings); + head->callback(c, head, ret); + } else { + x_handle_error(c, error); + } + free(ret); } return (xcb_generic_event_t *)ret; } diff --git a/src/x.h b/src/x.h index 3d264d0795..240b64be0c 100644 --- a/src/x.h +++ b/src/x.h @@ -81,12 +81,6 @@ struct x_connection { /// The list of pending async requests that we have /// yet to receive a reply for. struct list_node pending_x_requests; - /// The first request that has a reply. - struct x_async_request_base *first_request_with_reply; - /// A message, either an event or a reply, that is currently being held, because - /// there are messages of the opposite type with lower sequence numbers that we - /// need to return first. - xcb_raw_generic_event_t *message_on_hold; /// Previous handler of X errors XErrorHandler previous_xerror_handler; /// Information about the default screen @@ -95,8 +89,30 @@ struct x_connection { /// `x_poll_for_message`. Used for sequence number overflow /// detection. uint32_t last_sequence; - /// The sequence number of `message_on_hold` - uint32_t sequence_on_hold; + /// The sequence number of the last completed request. + uint32_t latest_completed_request; + /// The sequence number of the "event sync" request we sent. This is + /// a request we sent that is guaranteed to error, so we can be sure + /// `xcb_poll_for_event` will return something. This is akin to `xcb_aux_sync`, + /// except that guarantees a reply, this one guarantees an error. + /// + /// # Why do we need this? + /// + /// To understand why we need this, first notice we need a way to fetch replies + /// that are already in xcb's buffer, without reading from the X connection. + /// Because otherwise we can't going into sleep while being confident that there + /// is no buffered events we haven't handled. + /// + /// For events or unchecked errors (we will refer to both of them as events + /// without distinction), this is possible with `xcb_poll_for_queued_event`, but + /// for replies, there is no `xcb_poll_for_queued_reply` (ridiculous, if you + /// ask me). Luckily, if there is a reply already in the buffer, + /// `xcb_poll_for_reply` will return it without reading from X. And we can deduce + /// whether a reply is already received from the sequence number of received + /// events. The only problem, if no events are coming, we will be stuck + /// indefinitely, so we have to make our own events. + uint32_t event_sync; + bool event_sync_sent; }; /// Monitor info @@ -455,17 +471,13 @@ void x_request_vblank_event(struct x_connection *c, xcb_window_t window, uint64_ /// `req` store information about the request, including the callback. The callback is /// responsible for freeing `req`. static inline void x_await_request(struct x_connection *c, struct x_async_request_base *req) { - if (c->first_request_with_reply == NULL && !req->no_reply) { - c->first_request_with_reply = req; - } list_insert_before(&c->pending_x_requests, &req->siblings); } /// Poll for the next X event. This is like `xcb_poll_for_event`, but also includes /// machinery for handling async replies. Calling `xcb_poll_for_event` directly will /// cause replies to async requests to be lost, so that should never be called. -xcb_generic_event_t *x_poll_for_event(struct x_connection *c); - -static inline bool x_has_pending_requests(struct x_connection *c) { - return !list_is_empty(&c->pending_x_requests); -} +/// +/// @param[out] queued if true, only return events that are already in the queue, don't +/// attempt to read from the X connection. +xcb_generic_event_t *x_poll_for_event(struct x_connection *c, bool queued);