Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

autorelay: curtail addrsplosion #598

Merged
merged 42 commits into from
Apr 20, 2019
Merged

autorelay: curtail addrsplosion #598

merged 42 commits into from
Apr 20, 2019

Conversation

vyzo
Copy link
Contributor

@vyzo vyzo commented Apr 13, 2019

Relays, as discovered in the DHT, are addrsploded. The current (pre-)computation of advertised relay addrs fails to account for this fact, resulting in the advertisement of many dead addresses. In addition, many relays discovered in the DHT are actually dead.

This patch adds logic to deal with the situation

  • relay addrs are dynamically recomputed by the address factory using the relay addrs present in the peerstore instead of pre-computing based on what was advertised.
  • a special clean up pass is done on the relay address set to remove addrsploded addrs.
  • a retry pass is added to findRelays that deals with the case of no live relays in the discovered set.
  • we also increase the relay advertisement boot delay to 15 min so that some minimum node stability is ensured.

Note: the patch was originally much more extensive, adding a complex relay selection logic that tried to parallelize to deal with dead relays. That proved to be contentious, and biased against dedicated relays, so it has been reverted.

@ghost ghost assigned vyzo Apr 13, 2019
@ghost ghost added the status/in-progress In progress label Apr 13, 2019
@vyzo vyzo requested a review from Stebalien April 13, 2019 19:37
@vyzo
Copy link
Contributor Author

vyzo commented Apr 14, 2019

I think we might want to try parallel connects, as many of the relays in the discovered set (I have 183) appear to be dead.

@vyzo
Copy link
Contributor Author

vyzo commented Apr 14, 2019

If not parallel connects, at least parallel FindPeers.

@vyzo vyzo changed the title Increase autorelay discovery limit to 1k Increase autorelay discovery limit to 1k and curtail addrsplosion Apr 14, 2019
@vyzo
Copy link
Contributor Author

vyzo commented Apr 14, 2019

Added a relay address set clean up phase to curtail addrsplosion.

@vyzo
Copy link
Contributor Author

vyzo commented Apr 14, 2019

Maybe 15s to FindPeer and Connect is too aggressive though, might be better served with a 30s timeout.

@vyzo
Copy link
Contributor Author

vyzo commented Apr 14, 2019

I will revert it to 30s.

@vyzo vyzo force-pushed the feat/moar-relays branch from 5359653 to aa03a9b Compare April 14, 2019 09:58
@vyzo vyzo changed the title Increase autorelay discovery limit to 1k and curtail addrsplosion autorelay: rework relay selection logic to deal with dead relays and increasing relay set size Apr 14, 2019
@vyzo vyzo changed the title autorelay: rework relay selection logic to deal with dead relays and increasing relay set size autorelay: rework relay selection logic to deal with dead relays and curtail addrsplosion Apr 14, 2019
@vyzo
Copy link
Contributor Author

vyzo commented Apr 14, 2019

Reworked the relay selection logic to perform FindPeer in parallel.

@vyzo vyzo requested a review from raulk April 14, 2019 11:54
Copy link
Member

@Stebalien Stebalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there. Please ping me on IRC/slack when you want me to review this again (trying to shorten my review cycles a bit..).

p2p/host/relay/addrsplosion.go Outdated Show resolved Hide resolved
p2p/host/relay/addrsplosion.go Show resolved Hide resolved
p2p/host/relay/addrsplosion.go Show resolved Hide resolved
ma.ForEach(a, func(c ma.Component) bool {
switch c.Protocol().Code {
case ma.P_TCP, ma.P_UDP:
port = int(binary.BigEndian.Uint16(c.RawValue()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, this would only bother with the first port (although multiple ports isn't currently an issue except for relay addresses which we should be ignoring anyways).

p2p/host/relay/addrsplosion.go Show resolved Hide resolved
p2p/host/relay/autorelay.go Outdated Show resolved Hide resolved
case qr := <-resultCh:
rcount++
if qr.err == nil {
pi := cleanupAddressSet(qr.pi)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you mean we won't advertise addresses like /ip4/.../tcp/ADDRSPLODE/p2p/QmRelay/p2p-circuit/p2p/QmMe? Got it.

case qr := <-resultCh:
rcount++
if qr.err == nil {
pi := cleanupAddressSet(qr.pi)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although, if we're going through all this trouble for that, I really wish we could just wait for the relay to tell us its addresses.

Note: If we actually had a reserve, the relay would return the addresses it wants us to use in response to the reserve command.

(but oh well)

@vyzo vyzo requested a review from Stebalien April 19, 2019 11:14
vyzo added 2 commits April 19, 2019 19:48
…ss set

findRelays cleans up address sets now for addrsplosion, and this removes private
addrs as well.
@Stebalien
Copy link
Member

WRT #598 (comment)

The bad peer addresses will already get put into the peerstore anyways so this patch won't actually help fix dialing relays. This patch will allow us to trim down the address list auto-relay users announce but a better solution might be to re-calculate /p2p-cirucit addresses dynamically from the peerstore. That way, we get the latest addresses sent by our relay's identify service.

In terms of this code, @vyzo and I discussed generalizing it and using it from within either the dialer and/or the peerstore. Specifically, we can use the addrsplosion detection logic to trim down the set of addresses we're dialing. However, the current heuristics around circuit and private addresses obviously won't generalize (although we should drop addresses of the form /ip4/private/tcp/.../p2p-circuit/...).

So, the question is, is merging this patch worth it? IMO, not really given that it doesn't affect dialing and there's an (IMO) better option on the advertising end.

@vyzo
Copy link
Contributor Author

vyzo commented Apr 19, 2019

So the patch fixes two problems with the current code:

  • It fixes the addrsplosion in the announced addrs, which is a real problem as most of the address sets returned by discovery are addrsploded (this has been experimentally determined)
  • It fixes the handling of dead peers in the discovery set, which are processed serially and could result to very long times in determining a relay to use.

@vyzo
Copy link
Contributor Author

vyzo commented Apr 19, 2019

Wrt to moving forward, we can stop tracking the relay addrs discovered by FindPeer and use the set returned by identify after connecting (that is when constructing the address set in updateAddrs).
We will still have to clean it up though, as we might have garbage in the peerstore from the queries.

@vyzo
Copy link
Contributor Author

vyzo commented Apr 19, 2019

I have updated the code to dynamically compute the relay address set on demand.

@vyzo vyzo force-pushed the feat/moar-relays branch from 31c8d32 to 5c9299a Compare April 19, 2019 22:53
@vyzo
Copy link
Contributor Author

vyzo commented Apr 20, 2019

I have significantly simplified the patch, removing the contentious parallel query selection logic.
The behaviour wrt to relay selection and dialing matches the old behaviour now.

the presence of stashed query results from discovery in the peerstore _biases_ the
selection towards fully DHT nodes, which penalizes dedicated relays.
@vyzo vyzo force-pushed the feat/moar-relays branch from 96bcbd2 to 8d073ce Compare April 20, 2019 10:34
@vyzo vyzo changed the title autorelay: rework relay selection logic to deal with dead relays and curtail addrsplosion autorelay: curtail addrsplosion Apr 20, 2019
Copy link
Member

@Stebalien Stebalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks correct.

@Stebalien Stebalien merged commit 9fc5c96 into master Apr 20, 2019
@ghost ghost removed the status/in-progress In progress label Apr 20, 2019
@Stebalien Stebalien deleted the feat/moar-relays branch April 20, 2019 20:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants