Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backoff triggered inappropriately by context canceled errors from DHT queries. #96

Closed
zippy opened this issue Oct 18, 2017 · 2 comments · Fixed by libp2p/go-libp2p-swarm#38

Comments

@zippy
Copy link

zippy commented Oct 18, 2017

It seems like backoff is getting triggered inappropriately by context canceled errors from DHT queries.

I have been unable to get the tests to run from my own fork go-libp2p-kad-dht because of my fights with gx, however, the problem can be duplicated over on the holochain codebase which uses a variant of kat-dht.

Here is a test that reveals the problem. This test simply creates 10 nodes and has each node query the other then nodes in a nested for loop.

When I run it, the test invariably fails because [this line in swarm_dail.go] (https://github.com/libp2p/go-libp2p-swarm/blob/7edb555bea6f8c4a3183d79cedc0cd0ebced06da/swarm_dial.go#L188) gets called after a context canceled error, which I'm pretty sure is coming from the query, and is interpreted as a connection failure instead of the cancellation that it really is.

As an experiment I change that line to:

if err.Error() != "context canceled" {
	s.backf.AddBackoff(p) // let others know to backoff
}

And then my test runs just fine, which indicates that nodes are actually all available for connection it's just that the backoff from the cancel causes a subsequent connection to the same node that was cancelled be treated incorrectly as a network failure.

@Stebalien
Copy link
Member

Hm; this is going to be annoying to fix properly. For example, we can end up setting extremely high delays on all known peers if we lose connectivity for a bit (and, honestly, we should be backing off peers on a per address basis, not on a per peer basis). However, your solution will fix the immediate problem; thanks!

Future work will continue here: libp2p/go-libp2p#1554

@Stebalien
Copy link
Member

Fix here: libp2p/go-libp2p-swarm#38

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants