blockchain synchronisation speed deterioration #14330

allemathor · 2017-04-15T00:57:14Z

System information

Geth
Version: 1.6.0-unstable
Architecture: amd64
Protocol Versions: [63 62]
Network Id: 1
Go Version: go1.8
Operating System: windows

Hardware
Dell XPS i7, 2.20Ghz, 4 cores, 12Gb RAM, HDD + broadband connection

Expected behaviour

I'm already a week into the sync and would expect it to be done by now.
I also would expect a more or less stable or slightly degrading sync speed.

Actual behaviour

Started a full node on Apr 07, 2017

geth --maxpeers 11 --datadir

I witness a pretty much exponential degradation of the sync speed.

First 1 mln blocks synchronised in under 3 hours. After 1 mln blocks the synchronisation continuously deteriorated.

Blocks ---------------------- Sync Speed (blocks/hour)

0 - 1045923 ----------------- 348,641
1045923 - 1726874 -------- 54,129
172687 - .2322736 -------- 15,777
2322736 - 2344564 -------- 1,175
2407424 - 2425581 --------- 1,210

I can provide data on geth process performance (don't see an option to insert an image here)

Current log as of today 20170414

...........
INFO [04-14|19:27:18] Imported new chain segment blocks=8 txs=65 mgas=2.213 elapsed=3m19.046s mgasps=0.011 number=2425486 hash=1e4f1b…39ccc3
INFO [04-14|19:30:20] Imported new chain segment blocks=7 txs=54 mgas=2.162 elapsed=3m1.833s mgasps=0.012 number=2425493 hash=dc3cfa…496c62
INFO [04-14|19:32:48] Imported new chain segment blocks=2 txs=24 mgas=1.433 elapsed=2m27.802s mgasps=0.010 number=2425495 hash=ec348d…c0b1dc
INFO [04-14|19:33:18] Imported new chain segment blocks=1 txs=15 mgas=0.182 elapsed=30.056s mgasps=0.006 number=2425496 hash=6a5fbd…9757d2
INFO [04-14|19:36:17] Imported new chain segment blocks=5 txs=18 mgas=1.408 elapsed=2m58.670s mgasps=0.008 number=2425501 hash=3d9d0d…6f6ad2
INFO [04-14|19:37:54] Imported new chain segment blocks=6 txs=39 mgas=1.192 elapsed=1m37.747s mgasps=0.012 number=2425507 hash=8e29b3…e9d6e9
INFO [04-14|19:40:49] Imported new chain segment blocks=2 txs=19 mgas=1.184 elapsed=2m54.397s mgasps=0.007 number=2425509 hash=dae708…d5bb6b
INFO [04-14|19:44:13] Imported new chain segment blocks=5 txs=55 mgas=2.033 elapsed=3m23.924s mgasps=0.010 number=2425514 hash=1da67d…766e67
INFO [04-14|19:44:37] Imported new chain segment blocks=1 txs=15 mgas=0.350 elapsed=24.452s mgasps=0.014 number=2425515 hash=5ffe54…25d066
INFO [04-14|19:48:24] Imported new chain segment blocks=6 txs=61 mgas=2.264 elapsed=3m47.216s mgasps=0.010 number=2425521 hash=380e9b…75ad4f
INFO [04-14|19:52:34] Imported new chain segment blocks=6 txs=49 mgas=2.038 elapsed=4m9.630s mgasps=0.008 number=2425527 hash=d65a17…d45c4a
INFO [04-14|19:54:18] Imported new chain segment blocks=3 txs=38 mgas=1.000 elapsed=1m44.162s mgasps=0.010 number=2425530 hash=bc423e…26838f
INFO [04-14|19:58:01] Imported new chain segment blocks=14 txs=118 mgas=3.030 elapsed=3m42.548s mgasps=0.014 number=2425544 hash=99d02e…3ae18b
INFO [04-14|20:03:15] Imported new chain segment blocks=5 txs=23 mgas=1.376 elapsed=5m14.309s mgasps=0.004 number=2425549 hash=39fda9…8783be
INFO [04-14|20:03:24] Imported new chain segment blocks=15 txs=158 mgas=3.265 elapsed=9.301s mgasps=0.351 number=2425564 hash=37c91e…775da7
INFO [04-14|20:07:29] Imported new chain segment blocks=7 txs=54 mgas=1.369 elapsed=4m4.321s mgasps=0.006 number=2425571 hash=d61349…2e2bff
INFO [04-14|20:12:35] Imported new chain segment blocks=8 txs=84 mgas=2.512 elapsed=5m6.364s mgasps=0.008 number=2425579 hash=d3568c…11326f
INFO [04-14|20:17:43] Imported new chain segment blocks=2 txs=33 mgas=1.442 elapsed=5m8.283s mgasps=0.005 number=2425581 hash=de4acd…0bd866
.................

As seen from the log many blocks take minutes to sync
This morning I was at 2407424 blocks and now after 15 hours its 2425581 i.e. 18157 blocks in 15 hours => 1210 blocks/hour

Sync already continues for a week, although during normal work days I only run it at nights.

Steps to reproduce the behaviour

Just run the latest build from Git with default arguments

Backtrace

Start of the today's log

WARN [04-13|16:59:45] No etherbase set and no INFO [04-13|16:59:45] Starting peer-to-peer node INFO [04-13|16:59:45] Allocated cache and file handles INFO [04-13|16:59:51] Initialised chain configuration INFO [04-13|16:59:51] Disk storage enabled for ethash caches INFO [04-13|16:59:51] Disk storage enabled for ethash DAGs INFO [04-13|16:59:52] Initialising Ethereum protocol INFO [04-13|16:59:52] Loaded most recent local header INFO [04-13|16:59:52] Loaded most recent local full block INFO [04-13|16:59:52] Loaded most recent local fast block INFO [04-13|16:59:52] Starting P2P networking
INFO [04-13|16:59:54] RLPx listener up INFO [04-13|16:59:54] IPC endpoint opened: \.\pipe\get INFO [04-13|17:04:04] Block synchronisation started
instance=Geth/v1.6.0-unstable/windows-amd64/go1.8
database=C:\Users\xxx\AppData\Roaming\Ethereum_go\geth\chaindata cache=128 handles=1024
config="{ChainID: 1 Homestead: 1150000 DAO: 1920000 DAOSupport: true EIP150: 2463000 EIP155: 2675000 EIP158: 2675000 Engine: ethash}"
dir=C:\Users\xxx\AppData\Roaming\Ethereum_go\geth\ethash count=3
dir=C:\Users\xxx\AppData\Ethash count=2
versions="[63 62]" network=1
number=2407424 hash=049007…fa24ce td=71875898511406243634
number=2407424 hash=049007…fa24ce td=71875898511406243634
number=2407424 hash=049007…fa24ce td=71875898511406243634
self=enode://16d62db982f324a25592c85384341c2938356b8ecedff160ef5aeb4bf1e7ccf4620352aae32f235596b3ec2bdc829692887fd1dc5b6bdf9345466c321b1dd86d@[::]:30303
h.ipc
> epoch=80 percentage=68 elapsed=3.000s
epoch=80 elapsed=4.498s
epoch=80 err="FlushFileBuffers: The handle is invalid."
epoch=80 percentage=69 elapsed=3.000s
epoch=80 elapsed=4.540s
epoch=81 percentage=68 elapsed=3.000s
epoch=81 elapsed=4.518s
epoch=81 err="FlushFileBuffers: The handle is invalid."
epoch=81 percentage=69 elapsed=3.000s
epoch=81 elapsed=4.413s
blocks=1 txs=23 mgas=1.464 elapsed=28m52.334s mgasps=0.001 number=2407425 hash=bb11c2…479783
blocks=1 txs=89 mgas=1.455 elapsed=2.384s mgasps=0.610 number=2407426 hash=d15a0f…6a7b2c
blocks=4 txs=49 mgas=3.495 elapsed=11.505s mgasps=0.304 number=2407430 hash=b08729…74ca3e
blocks=8 txs=100 mgas=3.786 elapsed=10.420s mgasps=0.363 number=2407438 hash=a33f81…ecfc8c
blocks=8 txs=89 mgas=3.807 elapsed=10.407s mgasps=0.366 number=2407446 hash=3201c8…01a925
blocks=5 txs=100 mgas=3.643 elapsed=11.719s mgasps=0.311 number=2407451 hash=e856cc…7bcb91
blocks=10 txs=63 mgas=4.668 elapsed=11.393s mgasps=0.4

[backtrace]

I found some relevant issues/discussions e.g.
#3207
https://forum.ethereum.org/discussion/6327/block-sync-so-slow

Those discussion only accentuate the Slow speed of the sync. My point is however that the speed Deteriorates with the size of the datastore, and I wonder if this is a fundamental problem with the implementation and how it can be improved.

The following discussion was started by myself at apparently a dead forum
http://forum.ethereum.org/discussion/11826/eth-full-node-running-errors

The text was updated successfully, but these errors were encountered:

allemathor · 2017-04-15T14:35:40Z

Further investigation shows that this is Disk I/O i.e. database operations problem.

Huge Read response Time on ldb files
Huge Disk Queue Length. >10 on average all the time (!).
Disk activity 100% permanently (!)

Some data from Resource Explorer

Disk Activity
(only 4 top records shown. Check the Response time 526 ms !!! Only Read operations is problem)

Image PID File Read (B/sec) Write (B/sec) Total (B/sec) I/O Priority Response Time (ms)
geth.exe 14068 x\geth\chaindata\1169438.ldb 282 0 282 Normal 526
geth.exe 14068 x\geth\chaindata\1148274.ldb 512 0 512 Normal 228
geth.exe 14068 x\geth\chaindata\1148871.ldb 155 0 155 Normal 198
geth.exe 14068 x\geth\chaindata\1167902.ldb 4,096 0 4,096 Normal 183

Storage parameters (only 4 snapshots taken. Note Disk Queue Lenght. I have just one disk)
Logical Disk Physical Disk Active Time (%) Available Space (MB) Total Space (MB) Disk Queue Length
C: 0 100.00 209,641 703,885 6.82
C: 0 100.00 209,633 703,885 8.65
C: 0 100.00 209,644 703,885 13.23
C: 0 100.00 209,633 703,885 12.48

Can this be improved on a laptop with a single HDD disk?
I'm surprised that this problem doesn't seem to be anywhere discussed. Or am I just looking at it from a wrong prospective?

allemathor · 2017-04-16T00:26:25Z

Increasing cache size "geth --cache 1024 .." makes for a not-worth mentioning improvement. I see max response time 291ms vs 526 above ( 20 ms can be considered a problem already).
Seems LevelDB operations have to be tuned up somehow. May be the whole storage mechanism needs a drastic change.

haiitch · 2017-04-16T00:46:48Z

Can you confirm whether this performance problem with LevelDB is platform-specific?

allemathor · 2017-04-16T01:12:28Z

I didn't try this on Linux yet but I definitely will, and soon. Meanwhile hopefully somebody else can provide supporting evidence on Linux side.
I can confirm though that both eth and geth have this problem on Windows. Eth is much worse.

haiitch · 2017-04-16T01:25:42Z

I would offer myself to work improving this storage situation, but I'm away from home and without reliable broadband to sync the full database quickly enough. Do I need to have the full db sync'd to work on this? In case it's needed, how long would it take to sync the full thing over mobile at somewhere between 1 - 3 mbits/s?

allemathor · 2017-04-16T21:42:46Z

Fast sync works OK in the same setup.
11:20 geth --maxpeers 11 --cache 1024 --fast started fast sync
16:24 Fast sync complete, auto disabling ...
Number of blocks 3,547,493 with datastore size: 14.8 GB
Compare to Full node sync that stuck at block 2,427,143 with datastore size of 51.7 GB
It seems the problems with LevelDB start with datastore size above 20 GB. I wonder if Fast sync will commence choking when its datastore size crosses 20 GB line.

allemathor · 2017-04-16T21:47:06Z

Even now when Fast sync is over my disk is 100% busy all the time with average disk queue length >2

haiitch · 2017-04-17T04:01:58Z

This seems potentially relevant, and it's still an open issue
syndtr/goleveldb#93

allemathor · 2017-04-17T13:21:26Z

I see one potential problem: DefaultWriteBuffer = 4 * MiB (leveldb/opt/options.go)
This is likely too small especially for HDD. I'll increase it and test today

allemathor · 2017-04-18T00:24:18Z

With
DefaultBlockCacheCapacity = 80 * MiB
DefaultWriteBuffer = 80 * MiB
Nothing has changed.
Seems it is a deeper problem. If the leveldb options would be important, then it would show up from the start. But the big chunk of data was synced OK before the problem showed up.

haiitch · 2017-04-18T02:17:15Z

In the meantime I'm taking a look at Rocksdb, checking how hard it would be to have comparable / pluggable datastore based on Rocks.

karalabe · 2017-04-18T13:57:17Z

We've had a PR a while back that relied on rocksdb. The underlying design is the same as leveldb. You won't see any meaningful improvements.

The full historical database is huge (>140GB) and contains a humongous number of entries, all indexes by their hash (random). Hash based indexing means the disk is thrashed with inserts all over the place. You can more or less forget doing a full sync on an HDD. As the database gets larger, the disk cannot keep up.

allemathor · 2017-04-19T12:39:58Z

I'm running Parity for comparison. It uses Rocks. So far it works better than geth. In 12 hours it synced 1815098 blocks, which is about half million blocks more than geth did in the same time period.
However I see similar problems. 100% disk usage, long read time for many sst files. Parity uses much more memory too. Let's see how it'll perform above 20 mln blocks.
What @karalabe says sounds like at some point the full node will require specialised hardware to run i.e. big $$. Mining story repeats for the full node now.

karalabe · 2017-04-19T12:43:07Z

Parity has state pruning, so it's comparing apples to oranges. If your database is much smaller, the speed will obviously be a lot faster, irrelevant whether it's leveldb or rocksdb.

allemathor · 2017-04-19T12:55:10Z

I'm running Parity with "--pruning archive". Thought that was similar to geth without "--fast". OK then, no need to continue testing.

karalabe · 2017-04-19T12:55:29Z

Hmmm, maybe you are right then.

karalabe · 2017-04-19T12:57:44Z

Still, it's not really meaningful imho to compare the two. Parity doesn't store hash->trie mappings as far as I know and they dropped support to serve eth/63 and fast sync in favor of their own warp sync, so the database might contain wildly different data.

My point with this being that you can't compare LevelDB vs. RockDB while having wildly different data being stored into them. The only valid way would be to drop in rocksdb in place of leveldb into the codebase and see what it does.

allemathor · 2017-04-19T12:58:38Z

agree

allemathor · 2017-04-19T13:04:17Z

On the other hand, if both create a Full node, it doesn't seem to matter that the data storage format is different. For as long as the Node is truly Full. My concern if there are more efficient ways to organise the storage or the specialised hardware prospective is a fundamental reality.

allemathor · 2017-04-19T14:19:19Z

Citing #1889:
Fast synced node can "act as a full archive node from all intents and purposes".
Indeed the Fast sync is well-thought through and seems to solve this issue quite well.
Considering the ever-growing blockchain it's fundamentally just a matter of time when only a super-computer will be able to perform the classical validation. Nothing is wrong with that.
Fast sync is the way to go.

karalabe · 2017-04-19T14:36:41Z

Fast sync gets you a full node, but not an archive node. The difference being that opposed to an archive node which can retrieve arbitrarily old state, a full node can only retrieve relatively recent state. Generally though there's not much reason to run an archive node, unless you want to trace transactions from way back (e.g. block explorer).

allemathor · 2017-04-19T23:04:32Z

Well, then the doc has to be corrected. The sentence "act as a full archive node from all intents and purposes" is cited word-to-word. Although it is obvious the information is missing from a Fast sync node. ~20GB vs ">140GB" full archive. I guess the Full archive cannot be created from Fast synced node. I wonder if Fast nodes can serve at all for the purpose of synchronisation. I study the source code now to figure out the details of the Fast sync.
Anyway, the probabilistic validations and pruning seem to be inevitable future directions.

karalabe · 2017-04-19T23:28:52Z

Fast sync has all the blocks of the network so it can reconstruct the entire chain and past archival data if it really wants to + yes, it can serve the purpose of synchronization (otherwise the entire network would have been long stuck). Only past state trie entries aren't stored (which again, could be regenerated based on locally available data only).

…

On Thu, Apr 20, 2017 at 2:04 AM, allemathor ***@***.***> wrote: Well, then the doc has to be corrected. The sentence "act as a full archive node from all intents and purposes" is cited word-to-word. Although it is obvious the information is missing from a Fast sync node. ~20GB vs ">140GB" full archive. I guess the Full archive cannot be created from Fast synced node. I wonder if Fast nodes can serve at all for the purpose of synchronisation. I study the source code now to figure out the details of the Fast sync. Anyway, the probabilistic validations and pruning seem to be inevitable future directions. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#14330 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAH6GbU3U72MWxt9BgWhjQBf5iogtNQAks5rxpMBgaJpZM4M-NKc> .

allemathor · 2017-04-20T12:17:42Z

It's better then I thought. So, the trie hashes are pretty much make for the lion share of the datastore size. They will accumulate gradually again. Can geth be commanded to clean the trie up to a certain point (say top - 1024) if desired?

karalabe · 2017-04-20T12:19:01Z

Nick is actually working on that currently.

…

On Thu, Apr 20, 2017 at 3:17 PM, allemathor ***@***.***> wrote: It's better then I thought. So, the trie hashes are pretty much make for the lion share of the datastore size. They will accumulate gradually again. Can geth be commanded to clean the trie up to a certain point (say top - 1024) if desired? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#14330 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAH6GXbxa50HpoFm1p2zFEz6SMVaXtQdks5rx0zngaJpZM4M-NKc> .

vsaraph · 2017-10-16T17:17:14Z

Like @allemathor, I am attempting to sync an archive node, and I found this discussion thread when I also observed that sync speed deteriorates significantly past block ~2,200,000. I am currently at block 2.47M after about 9 days of syncing. I know that the --fast option exists, however I do in fact want to run transaction traces as @karalabe mentioned. Is there any hope of actually being able to obtain traces on transactions in blocks past 3,000,000, without doing an archive sync and potentially waiting 1 month+ for it to finish?

Xeenych · 2017-10-23T18:06:08Z

I have the same issue. Syncing almost stops after block#2331165
Database is on HDD.
I have 0.002 mgasps. Syncing one chain segment takes 10-15 minutes.
Should I move database to SSD?

vsaraph · 2017-10-23T19:58:35Z

@Xeenych and anyone else who ran into this issue: if it's any help at all, my geth node's syncing sped up for a brief period of time between roughly blocks 2475000 and 2675000. Take a look at this plot illustrating the sync progress of my node: http://cs.brown.edu/~vsaraph/geth_plot.png. If you compare my plot to the growth of the full chaindata directory over time here, you will see a spike in the size of the blockchain corresponding to the same interval of blocks I just mentioned. As you can see, my syncing has once again almost stopped, but I think this is because there was another spike in chaindata's size around Nov 22, 2016. I am expecting my syncing to pick up again after this second spike.

I think the sudden growth in the blockchain around these points in time has something to do with the DAO hack and/or resolving it? For example see this block: https://etherscan.io/block/2423581. There are a larger number of internal transactions that supposedly created new accounts.

karalabe · 2017-10-23T20:02:34Z

@vsaraph Why are you doing a full transaction-processing sync? That will take an insane amount of time and disk storage. A fast sync should get you on track in a couple hours.

karalabe · 2017-10-23T20:02:54Z

@Xeenych Yes, currently it's not feasible to sync the mainnet on an HDD.

vsaraph · 2017-10-23T21:10:35Z

@karalabe I need traces of transactions on the blockchain. When I tried running debug_traceTransaction (part of geth's management API) when fast syncing, I got an error about missing trie nodes. Similar to this user here: https://ethereum.stackexchange.com/questions/25041/geth-tracetransaction-missing-trie-node, they are told to use "--syncmode full". So one apparently needs an archive node to be able to execute debug_traceTransaction, so I'm not sure how to get around performing an archive sync of the blockchain.

lbrep · 2018-01-31T09:51:32Z

try exit if you are on geth console.

allemathor closed this as completed Apr 19, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

blockchain synchronisation speed deterioration #14330

blockchain synchronisation speed deterioration #14330

allemathor commented Apr 15, 2017

allemathor commented Apr 15, 2017

allemathor commented Apr 16, 2017

haiitch commented Apr 16, 2017

allemathor commented Apr 16, 2017

haiitch commented Apr 16, 2017

allemathor commented Apr 16, 2017

allemathor commented Apr 16, 2017

haiitch commented Apr 17, 2017

allemathor commented Apr 17, 2017

allemathor commented Apr 18, 2017

haiitch commented Apr 18, 2017

karalabe commented Apr 18, 2017

allemathor commented Apr 19, 2017

karalabe commented Apr 19, 2017

allemathor commented Apr 19, 2017

karalabe commented Apr 19, 2017

karalabe commented Apr 19, 2017

allemathor commented Apr 19, 2017

allemathor commented Apr 19, 2017

allemathor commented Apr 19, 2017

karalabe commented Apr 19, 2017

allemathor commented Apr 19, 2017

karalabe commented Apr 19, 2017 via email

allemathor commented Apr 20, 2017

karalabe commented Apr 20, 2017 via email

vsaraph commented Oct 16, 2017

Xeenych commented Oct 23, 2017

vsaraph commented Oct 23, 2017

karalabe commented Oct 23, 2017

karalabe commented Oct 23, 2017

vsaraph commented Oct 23, 2017 •

edited

Loading

lbrep commented Jan 31, 2018

blockchain synchronisation speed deterioration #14330

blockchain synchronisation speed deterioration #14330

Comments

allemathor commented Apr 15, 2017

System information

Expected behaviour

Actual behaviour

Blocks ---------------------- Sync Speed (blocks/hour)

Current log as of today 20170414

Steps to reproduce the behaviour

Backtrace

allemathor commented Apr 15, 2017

allemathor commented Apr 16, 2017

haiitch commented Apr 16, 2017

allemathor commented Apr 16, 2017

haiitch commented Apr 16, 2017

allemathor commented Apr 16, 2017

allemathor commented Apr 16, 2017

haiitch commented Apr 17, 2017

allemathor commented Apr 17, 2017

allemathor commented Apr 18, 2017

haiitch commented Apr 18, 2017

karalabe commented Apr 18, 2017

allemathor commented Apr 19, 2017

karalabe commented Apr 19, 2017

allemathor commented Apr 19, 2017

karalabe commented Apr 19, 2017

karalabe commented Apr 19, 2017

allemathor commented Apr 19, 2017

allemathor commented Apr 19, 2017

allemathor commented Apr 19, 2017

karalabe commented Apr 19, 2017

allemathor commented Apr 19, 2017

karalabe commented Apr 19, 2017 via email

allemathor commented Apr 20, 2017

karalabe commented Apr 20, 2017 via email

vsaraph commented Oct 16, 2017

Xeenych commented Oct 23, 2017

vsaraph commented Oct 23, 2017

karalabe commented Oct 23, 2017

karalabe commented Oct 23, 2017

vsaraph commented Oct 23, 2017 • edited Loading

lbrep commented Jan 31, 2018

vsaraph commented Oct 23, 2017 •

edited

Loading