-
Notifications
You must be signed in to change notification settings - Fork 20.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
blockchain synchronisation speed deterioration #14330
Comments
Further investigation shows that this is Disk I/O i.e. database operations problem.
Some data from Resource Explorer Disk Activity Image PID File Read (B/sec) Write (B/sec) Total (B/sec) I/O Priority Response Time (ms) Storage parameters (only 4 snapshots taken. Note Disk Queue Lenght. I have just one disk) Can this be improved on a laptop with a single HDD disk? |
Increasing cache size "geth --cache 1024 .." makes for a not-worth mentioning improvement. I see max response time 291ms vs 526 above ( 20 ms can be considered a problem already). |
Can you confirm whether this performance problem with LevelDB is platform-specific? |
I didn't try this on Linux yet but I definitely will, and soon. Meanwhile hopefully somebody else can provide supporting evidence on Linux side. |
I would offer myself to work improving this storage situation, but I'm away from home and without reliable broadband to sync the full database quickly enough. Do I need to have the full db sync'd to work on this? In case it's needed, how long would it take to sync the full thing over mobile at somewhere between 1 - 3 mbits/s? |
Fast sync works OK in the same setup. |
Even now when Fast sync is over my disk is 100% busy all the time with average disk queue length >2 |
This seems potentially relevant, and it's still an open issue |
I see one potential problem: DefaultWriteBuffer = 4 * MiB (leveldb/opt/options.go) |
With |
In the meantime I'm taking a look at Rocksdb, checking how hard it would be to have comparable / pluggable datastore based on Rocks. |
We've had a PR a while back that relied on rocksdb. The underlying design is the same as leveldb. You won't see any meaningful improvements. The full historical database is huge (>140GB) and contains a humongous number of entries, all indexes by their hash (random). Hash based indexing means the disk is thrashed with inserts all over the place. You can more or less forget doing a full sync on an HDD. As the database gets larger, the disk cannot keep up. |
I'm running Parity for comparison. It uses Rocks. So far it works better than geth. In 12 hours it synced 1815098 blocks, which is about half million blocks more than geth did in the same time period. |
Parity has state pruning, so it's comparing apples to oranges. If your database is much smaller, the speed will obviously be a lot faster, irrelevant whether it's leveldb or rocksdb. |
I'm running Parity with "--pruning archive". Thought that was similar to geth without "--fast". OK then, no need to continue testing. |
Hmmm, maybe you are right then. |
Still, it's not really meaningful imho to compare the two. Parity doesn't store hash->trie mappings as far as I know and they dropped support to serve eth/63 and fast sync in favor of their own warp sync, so the database might contain wildly different data. My point with this being that you can't compare LevelDB vs. RockDB while having wildly different data being stored into them. The only valid way would be to drop in rocksdb in place of leveldb into the codebase and see what it does. |
agree |
On the other hand, if both create a Full node, it doesn't seem to matter that the data storage format is different. For as long as the Node is truly Full. My concern if there are more efficient ways to organise the storage or the specialised hardware prospective is a fundamental reality. |
Citing #1889: |
Fast sync gets you a full node, but not an archive node. The difference being that opposed to an archive node which can retrieve arbitrarily old state, a full node can only retrieve relatively recent state. Generally though there's not much reason to run an archive node, unless you want to trace transactions from way back (e.g. block explorer). |
Well, then the doc has to be corrected. The sentence "act as a full archive node from all intents and purposes" is cited word-to-word. Although it is obvious the information is missing from a Fast sync node. ~20GB vs ">140GB" full archive. I guess the Full archive cannot be created from Fast synced node. I wonder if Fast nodes can serve at all for the purpose of synchronisation. I study the source code now to figure out the details of the Fast sync. |
Fast sync has all the blocks of the network so it can reconstruct the
entire chain and past archival data if it really wants to + yes, it can
serve the purpose of synchronization (otherwise the entire network would
have been long stuck). Only past state trie entries aren't stored (which
again, could be regenerated based on locally available data only).
…On Thu, Apr 20, 2017 at 2:04 AM, allemathor ***@***.***> wrote:
Well, then the doc has to be corrected. The sentence "act as a full
archive node from all intents and purposes" is cited word-to-word. Although
it is obvious the information is missing from a Fast sync node. ~20GB vs
">140GB" full archive. I guess the Full archive cannot be created from Fast
synced node. I wonder if Fast nodes can serve at all for the purpose of
synchronisation. I study the source code now to figure out the details of
the Fast sync.
Anyway, the probabilistic validations and pruning seem to be inevitable
future directions.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#14330 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAH6GbU3U72MWxt9BgWhjQBf5iogtNQAks5rxpMBgaJpZM4M-NKc>
.
|
It's better then I thought. So, the trie hashes are pretty much make for the lion share of the datastore size. They will accumulate gradually again. Can geth be commanded to clean the trie up to a certain point (say top - 1024) if desired? |
Nick is actually working on that currently.
…On Thu, Apr 20, 2017 at 3:17 PM, allemathor ***@***.***> wrote:
It's better then I thought. So, the trie hashes are pretty much make for
the lion share of the datastore size. They will accumulate gradually again.
Can geth be commanded to clean the trie up to a certain point (say top -
1024) if desired?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#14330 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAH6GXbxa50HpoFm1p2zFEz6SMVaXtQdks5rx0zngaJpZM4M-NKc>
.
|
Like @allemathor, I am attempting to sync an archive node, and I found this discussion thread when I also observed that sync speed deteriorates significantly past block ~2,200,000. I am currently at block 2.47M after about 9 days of syncing. I know that the --fast option exists, however I do in fact want to run transaction traces as @karalabe mentioned. Is there any hope of actually being able to obtain traces on transactions in blocks past 3,000,000, without doing an archive sync and potentially waiting 1 month+ for it to finish? |
I have the same issue. Syncing almost stops after block#2331165 |
@Xeenych and anyone else who ran into this issue: if it's any help at all, my geth node's syncing sped up for a brief period of time between roughly blocks 2475000 and 2675000. Take a look at this plot illustrating the sync progress of my node: http://cs.brown.edu/~vsaraph/geth_plot.png. If you compare my plot to the growth of the full chaindata directory over time here, you will see a spike in the size of the blockchain corresponding to the same interval of blocks I just mentioned. As you can see, my syncing has once again almost stopped, but I think this is because there was another spike in chaindata's size around Nov 22, 2016. I am expecting my syncing to pick up again after this second spike. I think the sudden growth in the blockchain around these points in time has something to do with the DAO hack and/or resolving it? For example see this block: https://etherscan.io/block/2423581. There are a larger number of internal transactions that supposedly created new accounts. |
@vsaraph Why are you doing a full transaction-processing sync? That will take an insane amount of time and disk storage. A fast sync should get you on track in a couple hours. |
@Xeenych Yes, currently it's not feasible to sync the mainnet on an HDD. |
@karalabe I need traces of transactions on the blockchain. When I tried running debug_traceTransaction (part of geth's management API) when fast syncing, I got an error about missing trie nodes. Similar to this user here: https://ethereum.stackexchange.com/questions/25041/geth-tracetransaction-missing-trie-node, they are told to use "--syncmode full". So one apparently needs an archive node to be able to execute debug_traceTransaction, so I'm not sure how to get around performing an archive sync of the blockchain. |
try |
System information
Geth
Version: 1.6.0-unstable
Architecture: amd64
Protocol Versions: [63 62]
Network Id: 1
Go Version: go1.8
Operating System: windows
Hardware
Dell XPS i7, 2.20Ghz, 4 cores, 12Gb RAM, HDD + broadband connection
Expected behaviour
I'm already a week into the sync and would expect it to be done by now.
I also would expect a more or less stable or slightly degrading sync speed.
Actual behaviour
Started a full node on Apr 07, 2017
I witness a pretty much exponential degradation of the sync speed.
First 1 mln blocks synchronised in under 3 hours. After 1 mln blocks the synchronisation continuously deteriorated.
Blocks ---------------------- Sync Speed (blocks/hour)
0 - 1045923 ----------------- 348,641
1045923 - 1726874 -------- 54,129
172687 - .2322736 -------- 15,777
2322736 - 2344564 -------- 1,175
2407424 - 2425581 --------- 1,210
I can provide data on geth process performance (don't see an option to insert an image here)
Current log as of today 20170414
...........
INFO [04-14|19:27:18] Imported new chain segment blocks=8 txs=65 mgas=2.213 elapsed=3m19.046s mgasps=0.011 number=2425486 hash=1e4f1b…39ccc3
INFO [04-14|19:30:20] Imported new chain segment blocks=7 txs=54 mgas=2.162 elapsed=3m1.833s mgasps=0.012 number=2425493 hash=dc3cfa…496c62
INFO [04-14|19:32:48] Imported new chain segment blocks=2 txs=24 mgas=1.433 elapsed=2m27.802s mgasps=0.010 number=2425495 hash=ec348d…c0b1dc
INFO [04-14|19:33:18] Imported new chain segment blocks=1 txs=15 mgas=0.182 elapsed=30.056s mgasps=0.006 number=2425496 hash=6a5fbd…9757d2
INFO [04-14|19:36:17] Imported new chain segment blocks=5 txs=18 mgas=1.408 elapsed=2m58.670s mgasps=0.008 number=2425501 hash=3d9d0d…6f6ad2
INFO [04-14|19:37:54] Imported new chain segment blocks=6 txs=39 mgas=1.192 elapsed=1m37.747s mgasps=0.012 number=2425507 hash=8e29b3…e9d6e9
INFO [04-14|19:40:49] Imported new chain segment blocks=2 txs=19 mgas=1.184 elapsed=2m54.397s mgasps=0.007 number=2425509 hash=dae708…d5bb6b
INFO [04-14|19:44:13] Imported new chain segment blocks=5 txs=55 mgas=2.033 elapsed=3m23.924s mgasps=0.010 number=2425514 hash=1da67d…766e67
INFO [04-14|19:44:37] Imported new chain segment blocks=1 txs=15 mgas=0.350 elapsed=24.452s mgasps=0.014 number=2425515 hash=5ffe54…25d066
INFO [04-14|19:48:24] Imported new chain segment blocks=6 txs=61 mgas=2.264 elapsed=3m47.216s mgasps=0.010 number=2425521 hash=380e9b…75ad4f
INFO [04-14|19:52:34] Imported new chain segment blocks=6 txs=49 mgas=2.038 elapsed=4m9.630s mgasps=0.008 number=2425527 hash=d65a17…d45c4a
INFO [04-14|19:54:18] Imported new chain segment blocks=3 txs=38 mgas=1.000 elapsed=1m44.162s mgasps=0.010 number=2425530 hash=bc423e…26838f
INFO [04-14|19:58:01] Imported new chain segment blocks=14 txs=118 mgas=3.030 elapsed=3m42.548s mgasps=0.014 number=2425544 hash=99d02e…3ae18b
INFO [04-14|20:03:15] Imported new chain segment blocks=5 txs=23 mgas=1.376 elapsed=5m14.309s mgasps=0.004 number=2425549 hash=39fda9…8783be
INFO [04-14|20:03:24] Imported new chain segment blocks=15 txs=158 mgas=3.265 elapsed=9.301s mgasps=0.351 number=2425564 hash=37c91e…775da7
INFO [04-14|20:07:29] Imported new chain segment blocks=7 txs=54 mgas=1.369 elapsed=4m4.321s mgasps=0.006 number=2425571 hash=d61349…2e2bff
INFO [04-14|20:12:35] Imported new chain segment blocks=8 txs=84 mgas=2.512 elapsed=5m6.364s mgasps=0.008 number=2425579 hash=d3568c…11326f
INFO [04-14|20:17:43] Imported new chain segment blocks=2 txs=33 mgas=1.442 elapsed=5m8.283s mgasps=0.005 number=2425581 hash=de4acd…0bd866
.................
As seen from the log many blocks take minutes to sync
This morning I was at 2407424 blocks and now after 15 hours its 2425581 i.e. 18157 blocks in 15 hours => 1210 blocks/hour
Sync already continues for a week, although during normal work days I only run it at nights.
Steps to reproduce the behaviour
Just run the latest build from Git with default arguments
Backtrace
Start of the today's log
WARN [04-13|16:59:45] No etherbase set and no accounts found as default
INFO [04-13|16:59:45] Starting peer-to-peer node instance=Geth/v1.6.0-unstable/windows-amd64/go1.8
INFO [04-13|16:59:45] Allocated cache and file handles database=C:\Users\xxx\AppData\Roaming\Ethereum_go\geth\chaindata cache=128 handles=1024
INFO [04-13|16:59:51] Initialised chain configuration config="{ChainID: 1 Homestead: 1150000 DAO: 1920000 DAOSupport: true EIP150: 2463000 EIP155: 2675000 EIP158: 2675000 Engine: ethash}"
INFO [04-13|16:59:51] Disk storage enabled for ethash caches dir=C:\Users\xxx\AppData\Roaming\Ethereum_go\geth\ethash count=3
INFO [04-13|16:59:51] Disk storage enabled for ethash DAGs dir=C:\Users\xxx\AppData\Ethash count=2
INFO [04-13|16:59:52] Initialising Ethereum protocol versions="[63 62]" network=1
INFO [04-13|16:59:52] Loaded most recent local header number=2407424 hash=049007…fa24ce td=71875898511406243634
INFO [04-13|16:59:52] Loaded most recent local full block number=2407424 hash=049007…fa24ce td=71875898511406243634
INFO [04-13|16:59:52] Loaded most recent local fast block number=2407424 hash=049007…fa24ce td=71875898511406243634
INFO [04-13|16:59:52] Starting P2P networking
INFO [04-13|16:59:54] RLPx listener up self=enode://16d62db982f324a25592c85384341c2938356b8ecedff160ef5aeb4bf1e7ccf4620352aae32f235596b3ec2bdc829692887fd1dc5b6bdf9345466c321b1dd86d@[::]:30303
INFO [04-13|16:59:54] IPC endpoint opened: \.\pipe\geth.ipc
INFO [04-13|17:04:04] Block synchronisation started
INFO [04-13|17:04:09] Generating ethash verification cache epoch=80 percentage=68 elapsed=3.000s
INFO [04-13|17:04:11] Generated ethash verification cache epoch=80 elapsed=4.498s
ERROR[04-13|17:04:11] Failed to generate mapped ethash cache epoch=80 err="FlushFileBuffers: The handle is invalid."
INFO [04-13|17:04:14] Generating ethash verification cache epoch=80 percentage=69 elapsed=3.000s
INFO [04-13|17:04:16] Generated ethash verification cache epoch=80 elapsed=4.540s
INFO [04-13|17:04:19] Generating ethash verification cache epoch=81 percentage=68 elapsed=3.000s
INFO [04-13|17:04:20] Generated ethash verification cache epoch=81 elapsed=4.518s
ERROR[04-13|17:04:21] Failed to generate mapped ethash cache epoch=81 err="FlushFileBuffers: The handle is invalid."
INFO [04-13|17:04:24] Generating ethash verification cache epoch=81 percentage=69 elapsed=3.000s
INFO [04-13|17:04:25] Generated ethash verification cache epoch=81 elapsed=4.413s
INFO [04-13|17:32:59] Imported new chain segment blocks=1 txs=23 mgas=1.464 elapsed=28m52.334s mgasps=0.001 number=2407425 hash=bb11c2…479783
INFO [04-13|17:33:01] Imported new chain segment blocks=1 txs=89 mgas=1.455 elapsed=2.384s mgasps=0.610 number=2407426 hash=d15a0f…6a7b2c
INFO [04-13|17:33:13] Imported new chain segment blocks=4 txs=49 mgas=3.495 elapsed=11.505s mgasps=0.304 number=2407430 hash=b08729…74ca3e
INFO [04-13|17:33:23] Imported new chain segment blocks=8 txs=100 mgas=3.786 elapsed=10.420s mgasps=0.363 number=2407438 hash=a33f81…ecfc8c
INFO [04-13|17:33:33] Imported new chain segment blocks=8 txs=89 mgas=3.807 elapsed=10.407s mgasps=0.366 number=2407446 hash=3201c8…01a925
INFO [04-13|17:33:45] Imported new chain segment blocks=5 txs=100 mgas=3.643 elapsed=11.719s mgasps=0.311 number=2407451 hash=e856cc…7bcb91
INFO [04-13|17:33:57] Imported new chain segment blocks=10 txs=63 mgas=4.668 elapsed=11.393s mgasps=0.4
I found some relevant issues/discussions e.g.
#3207
https://forum.ethereum.org/discussion/6327/block-sync-so-slow
Those discussion only accentuate the Slow speed of the sync. My point is however that the speed Deteriorates with the size of the datastore, and I wonder if this is a fundamental problem with the implementation and how it can be improved.
The following discussion was started by myself at apparently a dead forum
http://forum.ethereum.org/discussion/11826/eth-full-node-running-errors
The text was updated successfully, but these errors were encountered: