-
Notifications
You must be signed in to change notification settings - Fork 20.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Block sync performance unreliable on poor Internet connections #2587
Comments
I'm also getting lots of |
Interestingly also is that there seems no affinity with localised nodes that can provide the data - i.e. if I stand up local nodes (on other machines) on the same LAN these too are lost in the same manner unless I add them manually to a Ideally, if one or two localised nodes are fully syncced they would be the primary providers of data and only back-hauling globally to ensure the data is not corrupt / censored. I'll set up a play pit and have a look at what is going on - though I'm 2 days behind now and can't catch-up... |
My experience with geth on a poorly connected machine is the same: extreme brittleness, and lack of resiliency: even when the network restarts, block synchronization do not. |
NOTE: All switching occured during the node being active. Did not start/stop/resume. Did some syncing using the Network Link Conditioner running at various different settings. Starting at just Very bad internet which basically means that there’s a 500ms delay, limited 1mbps and 10% packet loss. Using this setting the node wouldn’t sync. Switching to High Latency DNS obviously didn’t pose much problem. Switching to Very Bad Internet again almost instantly dropped all peers and didn’t resume (it also threw in a Rolled back 2048 headers). Switched to 3G mode now (780kbps / 330kbps) the download eventually resumed but it is literally crawling forth, exactly what you’d expect of a 3G network. Considering you're living in NZ with very little nodes near you this would be somewhat expected. What would help us if you'd run the node as |
@obscuren okay thanks. I'll do that and post back results. |
@avastmick Could you run a speed test via http://www.speedtest.net/ with a European target and send us your specs? I'm really curious what the network latency is (bandwidth too of course). |
@karalabe Okay here we go - http://www.speedtest.net/my-result/5347272962 - target was SW UK Now with these stats I get it's going to be sloooowwww to sync. I know this (trust me); that's not the point. The point is the fragility for connections like this. If it takes days to sync, that's okay - that's what my Thanks for looking into this. |
Currently we have quite strict timeouts in place which somewhat aimed to ensure some connectivity guarantees and avoid stalling attacks. Unfortunately if you yourself are a very remote node with no other peers close by, this essentially causes you to consider everyone else a bad peer :D We're considering addig some CLI flag that would bump the timeouts, just need to make sure we don't affect performance and/or security adversely. |
Having repeat "Synchronisation failed: no peers to keep download active" messages here in Michigan, too. This has been quite annoying today. |
All these issues should be solves by our latest 1.4.6 release. Please try with than an open a new ticket if some issues still persist. |
UPDATE: there is another roll-up issue #2569 that focusses in on the error / bugs seen in the fast syncing (
--fast
flag on initial synchronisation). To clarify, the issues I am seeing are present regardless of how the syncing was initiatedSystem information
Expected behaviour
Consistent block synchronisation regardless of quality of network connection. Robust recovery from loss of connection. Suitable messaging to user for sync failure.
Actual behaviour
I live in rural New Zealand and work over ADSL (barely) "broadband" that is highly contended that leads to dropped connections, slow connections and high latency. The latter seems to cause considerable issues with Geth finding and retaining peers. So if I loop
net.peerCount
I can get across 10 second requests to the func: [0, 3, 10, 1, 1 , 1, 8, 8, 3, 3, 0, 0, 1...] and so on. After a period of low to zero peers the client no longer continues to sync (no log entry beyond theSynchronisation failed:
message). Restart client and synccing re-starts... rinse and repeatOccurs on both Windows 10 and Ubuntu 16.04 geth clients. The latter is also mobile (laptop). Same results, worse in busy areas (again a contention ratio issue).
If I run the same on Digital Ocean droplet, steady peer count, and no stalls and full (not
--fast
) sync in a couple of hours.Steps to reproduce the behaviour
Run node on poor network connection - need either 3G mobile in busy area, or a high contention-ratio ADSL.
I'd like to add to this - for information. Geth seems very sensitive to unreliable networks or high contention networks. I get stalling / hanging when at home or on 3G / 4G mobile.
This is an issue as it tends to mean that only low-latency, low-contention ratio connections work and this will undermine the goal of meshing and potential usage in non-developed locations.
The text was updated successfully, but these errors were encountered: