Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Merged by Bors] - fetch: smarter peer selection #6477

Closed
wants to merge 5 commits into from

Conversation

ivan4th
Copy link
Contributor

@ivan4th ivan4th commented Nov 20, 2024

Motivation

Most of the time, there are some nodes that already joined the P2P
network but didn't complete their initialization yet, and thus
attempts to e.g. fetch blobs to them end up in stream setup errors
(protocol not supported).
Additionally, during initial phases of syncv2 testing, we're going
to have a limited number of nodes supporting sync/2 protocol, and
when choosing peers for syncv2, we must only include these peers.

Description

Add a possibility to select best peers by their supported protocols,
which is used by syncv2.
When doing hash requests, only query peers that support hs/1 and as/1
protocols. This avoids querying peers that didn't finish their
initialization and thus didn't start their fetch servers yet.

Test Plan

Verify how sync works on a node

Add a possibility to select best peers by their supported protocols,
which is used by syncv2.
When doing hash requests, only query peers that support hs/1 and as/1
protocols. This avoids querying peers that didn't finish their
initialization and thus didn't start their fetch servers yet.
Copy link

codecov bot commented Nov 20, 2024

Codecov Report

Attention: Patch coverage is 83.87097% with 5 lines in your changes missing coverage. Please review.

Project coverage is 79.9%. Comparing base (69957b2) to head (8b3ffa9).
Report is 2 commits behind head on develop.

✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
fetch/fetch.go 58.3% 4 Missing and 1 partial ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           develop   #6477   +/-   ##
=======================================
  Coverage     79.9%   79.9%           
=======================================
  Files          352     352           
  Lines        46410   46438   +28     
=======================================
+ Hits         37094   37124   +30     
+ Misses        7210    7209    -1     
+ Partials      2106    2105    -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

fetch/peers/peers.go Outdated Show resolved Hide resolved
fetch/peers/peers.go Outdated Show resolved Hide resolved
Comment on lines +159 to +162
// SelectBestWithProtocols is similar to SelectBest but filters peers by supported protocols.
// If protocols is empty, it returns the best peers regardless of the protocol.
// If protocols is not empty, it returns the best peers that support at least one of the protocols.
func (p *Peers) SelectBestWithProtocols(n int, protocols []protocol.ID) []peer.ID {
Copy link
Member

@fasmat fasmat Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not fully understanding this - why only at least one protocol needs to be supported? The only place outside of tests where this is used activeSetProtocol and hashProtocol are requested, aren't both needed?

Copy link
Contributor Author

@ivan4th ivan4th Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that there may be multiple versions of syncv2 protocol deployed at some point in time: sync/2, sync/2.1 and so on. When choosing peers for multipeer sync, any of such nodes will do, it's just that different protocols will be in use, later versions being more traffic-efficient etc.

For the purpose of hash fetching, the requirement may be just one of activeSetProtocol / hashProtocol or all of them, depending on which IDs are being fetched. It maybe could make some sense to invent a more complex logic for paritioning the set of hashes to download by the protocol etc., but in practice the servers for activeSetProtocol and hashProtocol are enabled almost at the same instant at the end of the node initialization, and in a very unlikely case of a race the worst thing that may happen is current go-spacemesh behavior, that is, failed fetch which will be retried later, with failed peer getting lower priority when doing peer selection. So, in case of hash fetching, from practical standpoint it doesn't really matter whether we pass activeSetProtocol, hashProtocol or both to this function.

@ivan4th
Copy link
Contributor Author

ivan4th commented Nov 21, 2024

bors merge

spacemesh-bors bot pushed a commit that referenced this pull request Nov 21, 2024
## Motivation

Most of the time, there are some nodes that already joined the P2P
network but didn't complete their initialization yet, and thus
attempts to e.g. fetch blobs to them end up in stream setup errors
(protocol not supported).
Additionally, during initial phases of `syncv2` testing, we're going
to have a limited number of nodes supporting `sync/2` protocol, and
when choosing peers for `syncv2`, we must only include these peers.



Co-authored-by: Ivan Shvedunov <[email protected]>
@spacemesh-bors
Copy link

Build failed:

@ivan4th
Copy link
Contributor Author

ivan4th commented Nov 21, 2024

TestPartition_50_50 test hung:

2024-11-21T12:06:09.2924626Z goroutine 288 [semacquire, 58 minutes]:
2024-11-21T12:06:09.2925254Z sync.runtime_Semacquire(0xc002fae048?)
2024-11-21T12:06:09.2925915Z 	/usr/local/go/src/runtime/sema.go:71 +0x25
2024-11-21T12:06:09.2926539Z sync.(*WaitGroup).Wait(0x2c15c60?)
2024-11-21T12:06:09.2927171Z 	/usr/local/go/src/sync/waitgroup.go:118 +0x48
2024-11-21T12:06:09.2927886Z golang.org/x/sync/errgroup.(*Group).Wait(0xc000d92280)
2024-11-21T12:06:09.2928744Z 	/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:56 +0x25
2024-11-21T12:06:09.2930022Z github.com/spacemeshos/go-spacemesh/systest/tests.testPartition(0xc00082a4e0, 0xc00076e500, 0xc000139c00, 0x32, 0x6)
2024-11-21T12:06:09.2931094Z 	/src/systest/tests/partition_test.go:107 +0x64b
2024-11-21T12:06:09.2932209Z github.com/spacemeshos/go-spacemesh/systest/tests.TestPartition_50_50(0xc00082a4e0)
2024-11-21T12:06:09.2933790Z 	/src/systest/tests/partition_test.go:201 +0x11c
2024-11-21T12:06:09.2934875Z testing.tRunner(0xc00082a4e0, 0x317e6b0)
2024-11-21T12:06:09.2935562Z 	/usr/local/go/src/testing/testing.go:1690 +0xf4
2024-11-21T12:06:09.2936237Z created by testing.(*T).Run in goroutine 1
2024-11-21T12:06:09.2936935Z 	/usr/local/go/src/testing/testing.go:1743 +0x390

Trying again

@ivan4th
Copy link
Contributor Author

ivan4th commented Nov 21, 2024

bors merge

spacemesh-bors bot pushed a commit that referenced this pull request Nov 21, 2024
## Motivation

Most of the time, there are some nodes that already joined the P2P
network but didn't complete their initialization yet, and thus
attempts to e.g. fetch blobs to them end up in stream setup errors
(protocol not supported).
Additionally, during initial phases of `syncv2` testing, we're going
to have a limited number of nodes supporting `sync/2` protocol, and
when choosing peers for `syncv2`, we must only include these peers.



Co-authored-by: Ivan Shvedunov <[email protected]>
@spacemesh-bors
Copy link

Pull request successfully merged into develop.

Build succeeded:

@spacemesh-bors spacemesh-bors bot changed the title fetch: smarter peer selection [Merged by Bors] - fetch: smarter peer selection Nov 21, 2024
@spacemesh-bors spacemesh-bors bot closed this Nov 21, 2024
@spacemesh-bors spacemesh-bors bot deleted the feature/fetch-smarter-peer-selection branch November 21, 2024 13:50
@ivan4th ivan4th mentioned this pull request Nov 25, 2024
35 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants