Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AsyncCancelled in Warp #845

Closed
kazu-yamamoto opened this issue May 21, 2021 · 13 comments
Closed

AsyncCancelled in Warp #845

kazu-yamamoto opened this issue May 21, 2021 · 13 comments

Comments

@kazu-yamamoto
Copy link
Contributor

In my QUIC implementations, a QUIC connection consists of multiple Haskell threads. To avoid leaking threads, race and cuncurrently of the async library are used.

To implement an HTTP server over QUC, Warp is launched in a QUIC runner function. In this use case, Warp MUST NOT ignore AsyncCancelled. If Warp ignores it, threads are leaked.

Currently SomeException is ignored in many places of Warp. I would like to modify it so as at least AsyncCanceled is re-thrown. It would be easy to fix every part of SomeException but it is awkward.

@snoyberg I think you are an expert of exception handling. Could you suggest an elegant solution? For instance, introducing a type which does not match AsyncCancelled instead of SomeException.

@snoyberg
Copy link
Member

I'd replace all usage's of try with tryAny. Similar with catch, handle, and other exception catching mechanisms. The basic idea is that, when there's an async exception, we should clean up after it, but not recover from it.

If you're concerned about adding a new dependency to warp, I can add a Warp.Exception or similar module which implements this logic and move the rest of the code over to use it, if that would be helpful. What do you think?

@kazu-yamamoto
Copy link
Contributor Author

No objection to add a new dependency. Let's use excellent libraries. Could you take care of this issue?

@alaendle
Copy link

alaendle commented Jun 17, 2021

@kazu-yamamoto, @snoyberg To be honest I haven't completely verified my assumption, but my guess is that this or a related change now exposes "Thread killed by timeout manager" exceptions if a UncaughtExceptionHandler is in place - so at least in my case a the update from lts-17.15 (warp 3.3.14) to lts-18.0 (warp 3.3.16) breaks my logic to try a graceful shutdown of the process if an uncaught async exception is occurring. Please let me know if this was a wanted or unwanted side-effect. Means, do I need to adapt and ignore these exceptions, because they are used as an interruption signal and "crashing" these threads is intended; or may I hope for a release of the library that "wraps" the threaded function in something that swallows these exceptions? I hope the problem got clear enough - please let me know if I was unclear (I'm still a haskell beginner). Just one last note, I find it very hard if a UncaughtExceptionHandler need to decide which exceptions are acceptable (e.g. here as a signal to cancel a thread) and which indicate a potential problem (because a thread just crashed unexpectedly).

@snoyberg
Copy link
Member

I'm sorry, I don't understand the problem you're describing.

@alaendle
Copy link

@snoyberg Many thanks for your answer. I was very vague yesterday, but I had some time for additional investigation on the behavioural change I spotted between warp 3.3.14 and 3.3.16.

I hope I do a better job presenting the problem now:

The easiest way to detect the changed behaviour - add a handler for uncaught exceptions that prints out any exception (via setUncaughtExceptionHandler) and server some content via run. Now just telnet into your server and wait for the timeout.

andreas@andreas:~$ telnet localhost 8000
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
Connection closed by foreign host.
andreas@andreas:~$ 

Result with warp 3.3.14 - No exceptions is passed to the custom uncaught exception handler.
Result with warp 3.3.16 - Thread killed by timeout manager is printed to the console.

Hope this (together with my previous comment) make sense now. If things are still unclear I could try to extract a minimal code-set that directly reproduces the described change.

@chrisdone
Copy link

I've just upgraded from resolver: lts-15.7 to resolver: lts-18.0 and saw the following output:

ConnectionClosed
ensemble-server: Thread killed by timeout manager

Previously, I would see just ConnectionClosed. This is triggered, I believe, by my websocket handler attempting to send a heartbeat message on a closed connection. That's fine. The yesod handler looks like this:

getAdminWSR :: Handler ()
getAdminWSR = webSockets interaction where
 interaction = race_ sendUpdates sendHeartbeats

When the browser closes the websocket, this ConnectionClosed is thrown by the websocket library. The new part is Thread killed by timeout manager. I've traced that to here which is triggered by fork. Beyond that, it seems a level of detail below the concerns of users of Yesod.

Playing some more, I see that more browsing and closing tabs and then waiting a few minutes triggered:

ensemble-server: Thread killed by timeout manager
ensemble-server: Thread killed by timeout manager
ensemble-server: Thread killed by timeout manager
ensemble-server: Thread killed by timeout manager
ensemble-server: Thread killed by timeout manager
ensemble-server: Thread killed by timeout manager

I'm not really sure exactly why that is.

This doesn't stop the server from functioning, however.

Hunch: perhaps the webSockets function or the wai-websockets library is catching exceptions, including those thrown to the thread, which wai uses for its own internal mechanics? Hard to tell.

@chrisdone
Copy link

Perhaps #839 is related to the issue I'm seeing. I'll move my discussion there.

@chrisdone
Copy link

I moved my issue here: #850

@kazu-yamamoto
Copy link
Contributor Author

@snoyberg Control.Exception.throwTo is used in time-manager. I guess we should use UnliftIO.Exception.throwTo.

@snoyberg
Copy link
Member

I think we should move this discussion over to #850. And I'm not convinced switching to UnliftIO.Exception.throwTo is the full solution here

@snoyberg
Copy link
Member

@alaendle it would be much easier to be sure we're talking about the same issue here if you provide a minimal repro. Perhaps you can add it to #850 like @chrisdone has?

@alaendle
Copy link

@snoyberg - as my problem is already solved with #851 I think a reproduction case no longer makes sense? Again many thanks for your work!

@snoyberg
Copy link
Member

Agreed, no need for a repro, thanks for confirming the fix worked!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants