Summary:
Due to some unknown bug we could get into situation where shared exchange query is not notified about flush completion.
As result LockablePgClientSession could wait on latch forever, so this particular shared exchange thread cannot complete.
And it blocks one thread that executes CheckExpiredSessions.
When there are too many such threads, we get into situation when all poller threads are blocked and we cannot execute CheckExpiredSessions.
So all other pg client sessions also cannot be destroyed, because above poller is used for it.
The following was introduced to address this issue:
1) Switch from Latch to a new state in exchange, so even if flush did not respond, thread could be finished and joined.
2) Postpone joining on exchnage threads that are not finished yet. So hang of single exchange thread would not block other threads.
3) Added check to SharedExchangeQuery dtor, that response was sent.
It is possible that there are no other bugs, and actual issue caused by the following deadlock.
All poller threads were trying to join on exchange threads.
But actual requests did not complete since they also use the same pool as used by poller.
Jira: DB-14306
Original commit: 9b40a42446594bc0050c51a514946b04acd5eaf5/D40489
Test Plan: Jenkins
Reviewers: esheng
Reviewed By: esheng
Subscribers: ybase, yql
Tags: #jenkins-ready
Differential Revision: https://phorge.dev.yugabyte.com/D40658