Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fetch() rejection intermittently causes Deno to panic in threadsafe_functions.rs:113 #20279

Closed
tkiapril opened this issue Aug 25, 2023 · 4 comments · Fixed by #20217
Closed
Labels
bug Something isn't working correctly node native extension related to the node-api (.node)

Comments

@tkiapril
Copy link

tkiapril commented Aug 25, 2023

Credits to @eseiker who first discovered the issue and @Akamig who originally wrote this report.

[2023-08-25 03:57:09 EROR][emitter] Event emit POST request failed with an exception, blockNumber: 17948388  logIndex: 249  url: http://localhost:4000: TypeError: error sending request for url (http://localhost:4000/): error trying to connect: tcp connect error: Connection refused (os error 111)
    at async mainFetch (ext:deno_fetch/26_fetch.js:277:12)
    at async fetch (ext:deno_fetch/26_fetch.js:501:7)
    at async file:///home/ubuntu/Corvette/emitter.ts:287:34
    at async Promise.all (index 0)
    at async file:///home/ubuntu/Corvette/emitter.ts:280:36
    at async Promise.all (index 1)
    at async file:///home/ubuntu/Corvette/emitter.ts:262:22
    at async doMutex (file:///home/ubuntu/Corvette/utils/concurrencyUtils.ts:19:16)
    at async emitEvents (file:///home/ubuntu/Corvette/emitter.ts:259:7)

============================================================
Deno has panicked. This is a bug in Deno. Please report this
at https://github.com/denoland/deno/issues/new.
If you can reliably reproduce this panic, include the
reproduction steps and re-run with the RUST_BACKTRACE=1 env
var set and include the backtrace in your report.

Platform: linux x86_64
Version: 1.36.3
Args: ["deno", "run", "--allow-read", "--allow-write", "--allow-env", "--allow-run", "--allow-net", "--allow-ffi", "/home/ubuntu/Corvette/dev.ts", "--main"]

thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: TrySendError { kind: Disconnected }', cli/napi/threadsafe_functions.rs:113:40
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Code causing the panic: https://github.com/planetarium/Corvette/blob/0.1.0/emitter.ts#L286-L305

When fetch() promise rejects, deno intermittently panicks, and it seems to do so while trying to unwrap the error-causing callback at threadsafe_functions.ts.

While bashing at this issue we found some other cases of Result::unwrap() causing Deno panic, so we think there's a possibility that the problem might be rooted deeper than the deno_fetch implementation.

What we tried to see if it roots somewhere in the code, and didn't help:

  • Tried not to access the error object in catch, but that didn't work.
  • Remove try-catch and use .then() and .catch(() => {})(verbatim), completely ignoring the exception, which didn't help as well.
  • Used await [...].map().reduce((promiseAcc, promise) => {const acc = await promiseAcc, x = await promise; return [...promiseAcc, x]}) instead of await Promise.all([...]) in attempt to serialize the processing, but no chance.

Possibly related and referenced issues:


A few more questions on trying to solve the problem we're experiencing: we're a bit short on the understanding of what threadsafe_functions.ts does, so it would be of great help understanding the issue if anyone (possibly the authors of this code: @littledivy, @bartlomieju, @DjDeveloperr, @ry) could explain to us what is happening in the lines here:

// Receiver might have been already dropped
let _ = tx.send(());
});
// This call should never fail
self.sender.unbounded_send(call).unwrap();

  1. [inetstack] dead_socket_tx Causes Unit Test Regressions to Fail microsoft/demikernel#246 points out on a similar unbound Sender-Reciever structure that unbound_send will fail and cause panic if the receiver is disconnected before unbound_send is executed. If this is true, why should L113 never fail, despite as commented in L109, Receiver might already have been dropped?

  2. Shouldn't unbound_send(call) be awaited first before being unwrapped?

  3. If threadsafe_functions.ts is not the culprit, what could possibly be causing this issue? As far as we know, our code surrounding the fetch() call is not anything extraordinary.

@bartlomieju bartlomieju added bug Something isn't working correctly node native extension related to the node-api (.node) labels Aug 25, 2023
@mmastrac
Copy link
Contributor

#20217 may fix this, however the node work is still outstanding. I wonder if we can port the fetch portion over.

@Akamig
Copy link

Akamig commented Sep 16, 2023

@mmastrac
Oh, from what I understand, #20217 is an attempt to fail early if the res.body is not of an unwrappable type, preventing it from reaching the Result::unwrap() context? I also agree about the porting the fetch over, regardless, this is brilliant.

@mmastrac
Copy link
Contributor

@Akamig I'll see if I can finish up #20217 over the next week. I'd definitely like to land it as it will probably clean up a lot of long tail bugs like this one.

@mmastrac mmastrac linked a pull request Sep 16, 2023 that will close this issue
@mmastrac
Copy link
Contributor

mmastrac commented Dec 1, 2023

I am actually thinking this is related to #21406 now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working correctly node native extension related to the node-api (.node)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants