-
Notifications
You must be signed in to change notification settings - Fork 233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
backoff triggered inappropriately by context canceled
errors from DHT queries.
#96
Comments
Hm; this is going to be annoying to fix properly. For example, we can end up setting extremely high delays on all known peers if we lose connectivity for a bit (and, honestly, we should be backing off peers on a per address basis, not on a per peer basis). However, your solution will fix the immediate problem; thanks! Future work will continue here: libp2p/go-libp2p#1554 |
Fix here: libp2p/go-libp2p-swarm#38 |
It seems like backoff is getting triggered inappropriately by
context canceled
errors from DHT queries.I have been unable to get the tests to run from my own fork go-libp2p-kad-dht because of my fights with gx, however, the problem can be duplicated over on the holochain codebase which uses a variant of kat-dht.
Here is a test that reveals the problem. This test simply creates 10 nodes and has each node query the other then nodes in a nested for loop.
When I run it, the test invariably fails because [this line in swarm_dail.go] (https://github.com/libp2p/go-libp2p-swarm/blob/7edb555bea6f8c4a3183d79cedc0cd0ebced06da/swarm_dial.go#L188) gets called after a
context canceled
error, which I'm pretty sure is coming from the query, and is interpreted as a connection failure instead of the cancellation that it really is.As an experiment I change that line to:
And then my test runs just fine, which indicates that nodes are actually all available for connection it's just that the backoff from the cancel causes a subsequent connection to the same node that was cancelled be treated incorrectly as a network failure.
The text was updated successfully, but these errors were encountered: