Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix calculation and implementation of dial ratio and the max dialed connections limit #1670

Merged
merged 3 commits into from
Aug 18, 2021

Conversation

oneeman
Copy link
Contributor

@oneeman oneeman commented Aug 13, 2021

Description

Fixes two behaviors in p2p/Server which cause it to get more dialed connections than it should based on the dial ratio (which in turn impairs the ability of new nodes joining the network to find peers with open incoming connection slots):

  1. The bug described in Incorrect split between dialed and incoming peers when running a light server #1669, which affects LES servers.
  2. The fact that the dialer attempts two dials for every dialed peer connection it's missing:
    slots := (d.maxDialPeers - d.dialPeers) * 2

Initially I was planning to just fix #1669. But then when I tested it out and saw that it still ends up with more dialed peers than it should, I remembered this * 2 and realized that it would have to be removed in order to get the desired behavior.

This * 2 was introduced upstream in ethereum/go-ethereum#20592. Prior to that, it would attempt one dial for every additional dialed peer that it wanted: https://github.com/ethereum/go-ethereum/blob/58cf5686eab9019cc01e202e846a6bbc70a3301d/p2p/dial.go#L149-L160

The new behavior was introduced in that PR, but was not mentioned in the PR description, commented on in the code, nor commented on during PR review. But, seeing as it impairs the functioning of the dial ratio and geth was doing fine without it up to 1.9.10, I don't see a problem with removing it. It means that you will be slower to dial peers initially (though there is at any rate the maxActiveDials limit which defaults to 50) and if a dialed peer connection drops you will try to refill it one at a time instead of two at a time (if that makes sense), but that seems reasonable.

Other changes

Updated a test to work with the new behavior (the test assumed two dial slots for every dialed peer we want).

Tested

Verified that the dial ratio is respected by connecting a node with low limits to Alfajores and seeing that it only connects to the right number of dials peers, whether or not it's running the LES server.

Related issues

@oneeman oneeman force-pushed the oneeman/fix-maxdials-calculation branch from e3df1c2 to a0f4d6b Compare August 13, 2021 22:43
@oneeman oneeman marked this pull request as ready for review August 13, 2021 22:44
@oneeman oneeman requested a review from a team as a code owner August 13, 2021 22:44
@codecov
Copy link

codecov bot commented Aug 13, 2021

Codecov Report

Merging #1670 (0ac023c) into master (9d141cf) will increase coverage by 0.01%.
The diff coverage is 57.14%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1670      +/-   ##
==========================================
+ Coverage   56.06%   56.07%   +0.01%     
==========================================
  Files         605      605              
  Lines       80261    80264       +3     
==========================================
+ Hits        44999    45011      +12     
+ Misses      31731    31719      -12     
- Partials     3531     3534       +3     
Impacted Files Coverage Δ
cmd/utils/flags.go 2.86% <0.00%> (-0.01%) ⬇️
p2p/server.go 65.97% <75.00%> (+0.37%) ⬆️
p2p/dial.go 87.00% <100.00%> (ø)
eth/downloader/statesync.go 63.70% <0.00%> (-3.48%) ⬇️
p2p/enode/iter.go 90.55% <0.00%> (-2.37%) ⬇️
p2p/discover/table.go 80.46% <0.00%> (-2.34%) ⬇️
les/distributor.go 71.03% <0.00%> (-2.07%) ⬇️
consensus/istanbul/core/handler.go 81.60% <0.00%> (-1.61%) ⬇️
trie/trie.go 83.40% <0.00%> (-0.86%) ⬇️
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9d141cf...0ac023c. Read the comment docs.

Copy link
Contributor

@trianglesphere trianglesphere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've got a couple small comments on the tests, but otherwise it looks good to me.

p2p/dial_test.go Outdated Show resolved Hide resolved
p2p/dial_test.go Outdated Show resolved Hide resolved
@oneeman oneeman force-pushed the oneeman/fix-maxdials-calculation branch from a0f4d6b to 0ac023c Compare August 18, 2021 14:10
@oneeman
Copy link
Contributor Author

oneeman commented Aug 18, 2021

I've got a couple small comments on the tests, but otherwise it looks good to me.

Thanks! Updated those comments as well as another one I noticed (where "slots" should say "slot") in a new commit, and then I rebased on master.

@oneeman oneeman merged commit e8b6743 into master Aug 18, 2021
@oneeman oneeman deleted the oneeman/fix-maxdials-calculation branch August 18, 2021 14:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect split between dialed and incoming peers when running a light server
2 participants