-
Notifications
You must be signed in to change notification settings - Fork 2k
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sync procedure is fundamentally broken #5270
Comments
What out of sync timeframe you are talking about? I am able to catch up after beeing out of sync for 6 hours easily with an old i5 processor and an low budget SSD |
To me it looks it depends on your "luck" and peers you are connected to before you reached limit of peers. |
this is example from farmer node (no harvesters)
47 seconds per block. |
Ja, that sound legit. Take a look into #3298. I am currently trying to hard-ban peers that I can get no connection to so that the garbage collector can remove them from ChiaServer's internal peer-list |
Also, chia_full_node is bounded by CPU, it shows 100% CPU usage, while in this terrible sync. |
Another thing I see from logs, I am feeding known blocks to other nodes with speed of light, but this also, probably, one of reasons at CPU bounding. |
I echo this same problem - got out of sync a few days ago, and unable to regain sync. After restart multiple times and complete db delete, in 24 hours, I am only a little over half-sync'd. |
One more thing, looks like sometimes it just getting stuck on particular block, I presume, waiting something from particular peer and not getting |
at this moment this solution work... I think this only peers problem. If i have luck, sync work great, bad peer and not synced for 6hours. Synchronization is broken, totally. This isn`t problem of db, cpu or ssd, network. Only app have big problem. Manually wallet restart can helps but it afffect farming... |
Can you point towards procedure? |
i work on it... this is sample :) App looks like use only first added peer, but that is bad solution for farming. |
What speed of sync you have in this case? number of seconds per block |
40s per block as long as i`m lucky Only wallet restart helps for me. |
it is still slower than chain grows. |
Protocol and software are terminally ill. |
I managed to squeeze to 2 seconds per block (even less), but I will not even publish this, as this will kill it all, once gets out into the wild. |
Yes, |
There is more to that. |
Would you share your thoughts about the architectural mistakes? |
It looks like you either pull or give, once you share to peers who has less than you, your own sync suffers a lot. There is architectural bottleneck somewhere in the software. Most likely, this is related to CPU bounding I see. But if people will stop to share, then chia is as good as dead. I am approaching 1s/block sync speed with quite specific setup. Which is decent. But then again, if/once number of transactions grows 20-fold, then nobody can keep up, from looks of it :) |
From what I see also, most of people who suffer from sync issue (I've seen so far) are from Europe, where people started to get on board later, so there is huge pool of people who wants to get synchronized, and, possible, they are introduced to "closest" European peers as shortest route, which kills it for Europe ;) |
libtorrent works better in eu:) |
Having kept a real close eye on it today, the past 12 hours I've only been able to successfully farm for 1 hour. "Not Synched", even though the connection to peers are established. then randomly it'll sync, farm for a short while and die again. Trying all the tricks listed, manually connecting, clearing the historic connection data etc isn't working that well anymore. Getting a little frustrated with it, making me wonder is there any point continuing to plot at the moment. As more users join the network, it'll creek further until it dies unless something is urgently done to fix the issue. It's pointless plotting if you can't farm your plots reliably... |
Thus is what happening for for 2 weeks already, once network hit bigger. And yes, bigger it is - lower chance it will work. Unsync peers pulling from you blocks kill your full_node and it can't keep up with network anymore. Then it gets some air to breeze, sometimes manages to get in sync, but then again killed, as becoming attractive to pull fresh blocks. |
Same here ... |
We need a team to write new implementation, as I doubt we can get much further with current one. Performance target was completely missed. |
With my test setup I managed to get sync speed as low as 1.107 sec per block so far (at the end of chain, with fattest blocks). Despite of doing nothing specific about "bad peers". |
I've tested connecting direct to a friend also running Chia, connected.. no bandwidth or latency issues between my machine and his. he's up to date. it just can't seem to process the data fast enough to get in sync. |
Yep. CPU bottleneck. |
stuck again at "85 blocks behind" :) and now my log got full of and farming status switched from "Sync" to "Not synced or not connected to peers" |
It is not about what you run. it is about what some peers run. There are number of advices and additions to chia which recommend INFO. |
ok, now i understand |
so, end of story. at "-77 blocks mark" I have 108 active peers, all of them have newer blocks, but ALL of them gave me |
literally, ALL of them refused me |
2021-05-17T20:03:21.395 full_node chia.full_node.full_node: WARNING Invalid response for slot None |
and multi: |
How do you see those? |
still managed to get to "-65". |
from debug.log file |
ah, yes. Fun thing I don't see single peer banned twice. So it is not "couple of evil peers" |
-55 Now delay is happening every second block. |
and from my log also: So, nothing we can really do from a client perspective at the moment. ruled out a CPU issue on my rig. we're at the mercy of the peer we connect to and IF that peer wants to give away some data. I'm not sure how this would improve at the moment, if those peers who are fully in sync are being bottlenecked, then they are just going to get worse as more machines come online. |
nope, it is, but at this point on their side :)
|
I am at "-39" now. |
what is happening, I need to get lucky that one of peers will respond. So, it is keeping to go through the list of peers (10 seconds per each or whatever it is) till it gets one which responds. Closer to the "edge" more difficult to find one which has resources to respond. |
"-6" blocks mark! :) |
So, all we can do is
|
at -5 blocks mark there are only 15 peers which are better than me. And now new coming. |
which property is for the max wait time? |
so, I reached to the current. |
funny, now al nodes in my peer list show |
so, that's a big F.. Something is VERY wrong..... |
this is the best I could get :) |
which property is for the max wait time? |
Have no clue. Didn't find. |
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
You can see that there is a number of sync bugs and discussions. And they are appearing with increasing rate.
I looked into it for last 2 days and outcome is simple. Once number of transactions increased, sync speed is lagging speed of grows of blockchain.
In my test which goes for last 36 hours which I started from scratch, I can't catch current block, and this is happening on hardware you can't blame: threadripper, very SSD for DB, real 1GbE connection.
This means once typical node got out of sync for whatever reasons, it is difficult to catch up. And this should be reason of this massive surge of out of sync issues.
The text was updated successfully, but these errors were encountered: