Skip to content
This repository has been archived by the owner on Nov 6, 2020. It is now read-only.

Initial Sync disk IO / write amplification / disk usage #6280

Closed
c0deright opened this issue Aug 10, 2017 · 25 comments · Fixed by #7348
Closed

Initial Sync disk IO / write amplification / disk usage #6280

c0deright opened this issue Aug 10, 2017 · 25 comments · Fixed by #7348
Labels
F3-annoyance 💩 The client behaves within expectations, however this “expected behaviour” itself is at issue. F7-footprint 🐾 An enhancement to provide a smaller (system load, memory, network or disk) footprint. M4-core ⛓ Core client code / Rust. P0-dropeverything 🌋 Everyone should address the issue now. Q7-involved 💪 Can be fixed by a team of developers and probably takes some time.
Milestone

Comments

@c0deright
Copy link

c0deright commented Aug 10, 2017

  • Parity version: 1.7.0
  • Operating system: Ubuntu 16.04 LTS
  • And installed: via deb Package from https://parity.io/parity.html
  • cmd line: parity daemon /foo/bar/parity.pid
  • config.toml
[ui]
disable = true

[network]
nat = "any"
discovery = true
no_warp = true
allow_ips = "public"

[rpc]
disable = true

[websockets]
disable = true

[ipc]
disable = true

[dapps]
disable = true

[footprint]
tracing = "off"
db_compaction = "ssd"
pruning = "archive"
cache_size = 55000

[snapshots]
disable_periodic = true

[misc]
log_file = "/foo/bar/parity.txt"

I'm trying to setup a full node with complete history, thus pruning=archive.

Disk IO looks like this on Amazon EC2 instance type c4.8xlarge:

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
xvdf           4252.00         0.36       519.55          0        519

parity is constantly writing to disk with ~300-500 MByte/s and peaks reach ~5000 IOPS.

What really bothers me is that parity is wasting so much disk space. There are times where i can see that /home grows by 1 GB/s just watching df -h /home.

Some time after 2 Millions blocks were passed disk usage on /home was ~80 GB of data from parity alone. When stopping parity, that 80 GB of disk usage magically shrinks to ~37 GB of disk usage just to grow with 1 GB/s again after restarting parity.

Parity even ran out of disk space after filling up a 100GB EBS volume on Amazon AWS and at that time it had only downloaded about 50% of blocks.

My questions are:

  • why is parity writing at such a high volume to disk even though I've set cache size to 55GB?
    • is this some sort of write amplification? it isn't logical for me that an application that downloads less than 1 MB/s from the internet is writing 500 MB/s to disk at all times.
  • why is parity using so much disk space temporarily? Even without restarting parity, disk usage sometimes goes down 20 GB or more from one second to the next.
@c0deright c0deright changed the title Initial Sync disk IO / write amplification Initial Sync disk IO / write amplification / disk usage Aug 10, 2017
@5chdn 5chdn added Z1-question 🙋‍♀️ Issue is a question. Closer should answer. F3-annoyance 💩 The client behaves within expectations, however this “expected behaviour” itself is at issue. M4-core ⛓ Core client code / Rust. and removed Z1-question 🙋‍♀️ Issue is a question. Closer should answer. labels Aug 11, 2017
@5chdn
Copy link
Contributor

5chdn commented Aug 11, 2017

You are not alone with this issue.

Probably related, but not obviously:

@ezredd
Copy link

ezredd commented Aug 11, 2017

I am the user who posted

https://ethereum.stackexchange.com/questions/24158/speed-of-syncing-the-chain-in-parity-using-archive-pruning-mode

the size of my node when i don't run parity is about 235GB. When i launched with this command overnight

parity --pruning archive --snapshot-peers 40 --cache-size-db 256 --cache-size-blocks 128 --cache-size-queue 256 --cache-size-state 256 --cache-size 4096 --db-compaction hdd

it peaked at 430GB in the morning and when i closed parity it went back to around 230GB.

For information about my system i use:

  • Parity: version Parity/v1.7.0-beta-5f2cabd-20170727/x86_64-macos/rustc1.18.0

  • MacOS Sierra Version 10.12.2 (16C67) on an iMac 27-inch Mid 2011 with 24GB of DDR3 ram

and here's a pastebin of my syncing with parity

https://pastebin.com/pXZTL7G6

@c0deright
Copy link
Author

c0deright commented Aug 14, 2017

Disk Usage while still Syncing

The node was syncing until late sunday, when disk usage dropped and then stayed down.

Disk usage right now is 218 GB and while sync was active parity used more than twice that amount (488 GB).

@arkpar
Copy link
Collaborator

arkpar commented Aug 14, 2017

Cache settings does not really affect write amplification. These are designed to reduce read amplification and minimize block processing times when the node is up to date. db_compaction is the only option what trades write amplification for space IIRC. Parity uses RocksDB as the underlying database. Importing a block involves adding a lot of random key-value pairs into the database some space is preallocated for faster insertion of new keys. Unused space is freed in the background compaction process. See here for more details:
https://github.com/facebook/rocksdb/wiki/RocksDB-Tuning-Guide

We have a long term plan to move to a custom database backend that would allow for a more efficient state I/O.

@5chdn 5chdn added the F7-footprint 🐾 An enhancement to provide a smaller (system load, memory, network or disk) footprint. label Aug 16, 2017
@gituser
Copy link

gituser commented Aug 26, 2017

subscribing to this as well, it's so annoying, parity is eating I/O like a monster constantly, 1000 times more than bitcoin or any bitcoin based coin.

there is no workaround currently to control/limit the i/o without breaking syncing process @arkpar?

@5chdn 5chdn added the P2-asap 🌊 No need to stop dead in your tracks, however issue should be addressed as soon as possible. label Aug 28, 2017
@arkpar
Copy link
Collaborator

arkpar commented Aug 28, 2017

@gituser could you post logs?

@c0deright
Copy link
Author

c0deright commented Sep 4, 2017

Disk Usage today
(Times are UTC +0200)

Parity running with

[footprint]
pruning = "archive"

just went from 250G to 500G in 1 hour, filling the volume.

After resizing the volume to 750G, disk usage drops back to 250G as soon as I started parity again.

I'm constantly having WTF moments working with parity :)

It seems we have to deploy monit to restart parity as soon as it goes nuts to prevent it filling the whole volume in minutes.

@5chdn 5chdn added P0-dropeverything 🌋 Everyone should address the issue now. Q7-involved 💪 Can be fixed by a team of developers and probably takes some time. and removed P2-asap 🌊 No need to stop dead in your tracks, however issue should be addressed as soon as possible. labels Sep 4, 2017
@5chdn
Copy link
Contributor

5chdn commented Sep 4, 2017

The work on that already started, but it includes a new database layer and a lot of refactoring. It will not be available before 1.8. #6418

@c0deright
Copy link
Author

c0deright commented Sep 4, 2017

The cause of this spike seems to be a reorg:

2017-09-04 09:30:14  Imported #4236763 fe21…0f40 (165 txs, 6.68 Mgas, 5883.20 ms, 24.42 KiB)
2017-09-04 09:30:26     1/25 peers     34 MiB chain   73 MiB db  0 bytes queue   23 KiB sync  RPC:  0 conn,  0 req/s, 14381 µs
2017-09-04 09:30:38  Imported #4236765 2d3e…f35e (63 txs, 2.86 Mgas, 1266.32 ms, 9.94 KiB)
2017-09-04 09:30:56     1/25 peers     34 MiB chain   73 MiB db  0 bytes queue   23 KiB sync  RPC:  0 conn,  0 req/s, 14381 µs
2017-09-04 09:30:59  Reorg to #4236765 0cca…518a (2d3e…f35e #4236764 2a6f…58cb )
2017-09-04 09:30:59  Imported #4236765 0cca…518a (159 txs, 6.69 Mgas, 3762.77 ms, 30.28 KiB)
2017-09-04 09:31:04  Imported #4236766 2c59…fd49 (184 txs, 6.68 Mgas, 2557.42 ms, 26.92 KiB)
2017-09-04 09:31:23  Imported #4236767 cfd6…0e80 (113 txs, 6.66 Mgas, 8276.83 ms, 20.67 KiB)

The growing starts at around 09:30 where the reorg is logged.

Reorgs happened often before so it's not clear this has anything to do with the issue. For reference:

2017-09-04 04:09:13  Reorg to #4235967 931d…9fa8 (f0bb…9939 #4235965 7dec…6b4d c8c1…2510)
2017-09-04 04:21:15  Reorg to #4235996 e2b1…5bda (2ed2…bea8 #4235994 b3c6…65b0 1253…f871)
2017-09-04 05:16:45  Reorg to #4236119 c8cb…3545 (aacf…74b8 #4236118 131a…5288 )
2017-09-04 05:27:31  Reorg to #4236146 40bc…1c7d (4b8f…1cf5 #4236144 b689…249b d88e…949e)
2017-09-04 05:41:06  Reorg to #4236172 3cb3…7aa6 (d1ab…78f9 #4236170 8e5a…9b80 e40d…352a)
2017-09-04 05:52:36  Reorg to #4236198 1508…d746 (c450…7ee6 #4236196 c96a…c7ce a5d4…98a6)
2017-09-04 06:24:51  Reorg to #4236272 3633…abc7 (5ed8…b8c4 #4236270 0454…a9d7 6d2c…b25e)
2017-09-04 06:34:31  Reorg to #4236303 39c3…15fe (d88f…c139 #4236302 d470…3f4d )
2017-09-04 06:35:20  Reorg to #4236306 4ce7…ee98 (7d7e…5102 #4236304 6fe2…bbb9 66cb…932b)
2017-09-04 07:04:32  Reorg to #4236389 cc2c…8777 (8c73…575d #4236387 3ae2…d444 5bf5…da71)
2017-09-04 07:10:07  Reorg to #4236408 a094…9bc0 (3940…4ed9 #4236406 9bba…d53a 867b…f650)
2017-09-04 07:10:07  Reorg to #4236408 6ae1…972a (a094…9bc0 867b…f650 #4236406 9bba…d53a 3940…4ed9)
2017-09-04 07:54:43  Reorg to #4236507 cc53…1e5e (5472…7834 #4236506 c949…f930 )
2017-09-04 08:12:32  Reorg to #4236552 1402…1940 (d398…8aed #4236550 8dbf…055d d5f5…1a21)
2017-09-04 08:16:19  Reorg to #4236565 9ab9…866d (d499…8864 d132…bcb1 #4236563 2d7e…36b6 4b7b…4307)
2017-09-04 08:44:54  Reorg to #4236631 58ad…06c8 (0012…8979 #4236630 81ca…ed52 )
2017-09-04 09:06:19  Reorg to #4236703 781f…417f (f858…ab4e #4236702 21e4…8cbc )
2017-09-04 09:09:26  Reorg to #4236713 485b…db86 (43e6…a7ca #4236711 2e33…bff9 e667…582f)
2017-09-04 09:24:57  Reorg to #4236746 b5d2…d179 (4f2e…9503 #4236744 5657…8057 ff00…b65f)
2017-09-04 09:30:59  Reorg to #4236765 0cca…518a (2d3e…f35e #4236764 2a6f…58cb )
2017-09-04 09:40:53  Reorg to #4236788 195e…132b (9cba…ad41 #4236787 fafe…af79 )
2017-09-04 09:48:20  Reorg to #4236808 d6e5…9fd1 (55d2…e9a7 #4236807 c9ed…1dd2 )
2017-09-04 09:55:50  Reorg to #4236822 121d…7faf (a1f1…27d2 #4236820 9c71…a3da dcea…08f3)
2017-09-04 10:33:52  Reorg to #4236913 8596…af57 (24c1…a231 24e2…6124 #4236910 bbca…4a4e 3a65…3403 e9d5…e5d5)
2017-09-04 10:43:20  Reorg to #4236934 8c5e…70ee (e15b…bb99 #4236932 6418…6856 3330…2be1)
2017-09-04 10:52:00  Reorg to #4236959 56ba…53c4 (51c5…8cdb 7c08…8f92 #4236957 ad4e…c246 97c6…4a75)
2017-09-04 11:04:25  Reorg to #4236988 d12b…14e8 (073b…1ba9 #4236986 7d5c…d551 5194…2a10)
2017-09-04 11:07:14  Reorg to #4237000 6de0…f926 (bace…f9fc #4236998 2c6a…baeb 1806…e8e7)

@rphmeier
Copy link
Contributor

rphmeier commented Sep 4, 2017

The thing about archiveDB is that it keeps everything. It will keep the full state of all blocks processed, even those which are eventually reorganized out of the chain. I'm not sure how well rocksdb handles having that much data, but it will definitely put a strain on your storage. I am not sure that even a specialized database (which we are in the process of building) would alleviate this much.

Something more useful might be a semi-pruned mode, where we discard non-canonical states after a certain point, but keep all state of canonical blocks.

@jo-tud
Copy link

jo-tud commented Sep 15, 2017

I created a chart of the chain folder size while syncing parity (v1.6.0) in --pruning fast and --pruning archive mode.

A) For the purple line the zig-zag is due to the regular pruning that occurs, right?
B) But what I don't understand is the spikes in the blue line, in archive mode. Looks like also some type of pruning - but there shouldn't be any, right?

Imgur

Higher resolution: https://imgur.com/a/cx9et

@c0deright
Copy link
Author

This seems to be the result of compacting the RocksDB. RocksDB writes a LOT to disk and from time to time this is compacted, resulting in massive disk usage drops.

@ewiner
Copy link

ewiner commented Oct 12, 2017

I'm experiencing the exact same thing as @jlopp, even wrote a similar script.

@jlopp
Copy link

jlopp commented Oct 12, 2017

Just to follow up on this, we were syncing Parity 1.7.2 with a 500 GB disk. Eventually we increased it to a 1 TB disk and were able to complete the sync. So there definitely appears to be a huge inefficiency somewhere that is causing the disk usage to be far higher than it needs to be. I just checked and one of our nodes that is still syncing is using 660GB of disk space, but if I restart parity it drops to 300GB.

@ewiner
Copy link

ewiner commented Oct 12, 2017

Yep. Looks like a more permanent fix has been pushed out to 1.9 - as an interim solution, is there some way that Parity could trigger DB compaction more frequently, instead of having to stop and restart the process?

@5chdn
Copy link
Contributor

5chdn commented Oct 12, 2017

we are looking for a more permanent solution for this and started working on our own database implementation https://github.com/debris/paritydb/

But 1.8 is about to be released very soon, therefore I modified the milestone.

@jlopp
Copy link

jlopp commented Oct 12, 2017

Cool; worth noting that I ran into similar issues with Ripple nodes - they also use RocksDB by default. Ripple ended up writing their own DB called NuDB and when we switched to it, the problems were fixed.

@ewiner
Copy link

ewiner commented Oct 13, 2017

In case some weary traveler with finite disk space happens upon this github ticket before v1.9 comes out, here's my simple script to get sync working on a Mac:

import subprocess
import os
import time
import signal

while True:
    print("Running Parity...")
    proc = subprocess.Popen(['parity', '--tracing', 'on', '--pruning', 'archive'])
    print("Parity running with pid {0}".format(proc.pid))
    while True:
        time.sleep(30)
        # https://stackoverflow.com/a/787832
        s = os.statvfs('/')
        gigs_left = (s.f_bavail * s.f_frsize) / 1024 / 1024 / 1024
        print('{0} GB left'.format(gigs_left))
        if gigs_left < 90:
            break
    print("Terminating Parity...")
    os.kill(proc.pid, signal.SIGINT)
    proc.wait()

@tjayrush
Copy link

Has anyone suggested an archive mode that stores only the balances at each block?

I'm working on a fully decentralized accounting/auditing project that has been working fine since summer 2016, but over the last few weeks, Parity is constantly failing because its disc usage grows from 400GB to over 800GB about twice a day. This blows out my 1TB drive.

The recent article about the chain's size argues that the archive mode is unneeded (and does not increase security) because one can always rebuild the state by replaying transactions. This is true, and a perfectly legitimate position, but it misses a point. Without some source of a "double-check" that the rebuilding of state from transactions is acurate, it's impossible to have any faith in the results. You can end up at the end of the process with the same state, but what happens if you don't. You have a bug, but without an archive of previous states, finding that bug is impossible.

If there was a mode where, at each block, my code (which is building state from transaction history) could double check that it's correct and quickly identify problems. I know that some addresses don't even carry a balance, so this doesn't work for every address, but it would work for "accounting" where balances are all that really matters.

Upshot: add a feature called --pruning archive-balances that only stores balances per block.

@tomusdrw
Copy link
Collaborator

Also a possible solution would be to store a checkpoint state every X blocks and recompute state from the closest checkpoint not from the genesis.

@tjayrush
Copy link

I asked a question a couple of days ago about the snapshot in Parity. (1) does the snapshot work even if one is not using archive mode, (2) can I get at the data in the snapshot? If there's a continuum from full archive mode to warp mode. Storing just balances would be closer to full archive and giving access to snapshots would be closer to warp mode. Both would work--balances at every block would be easier for my work, but either would be welcome because full archive is a real problem.

@5chdn
Copy link
Contributor

5chdn commented Jan 3, 2018

🎉

@tomachinz
Copy link

Why do we care about old state anyway? Do smart contracts look back in time - I thought they can only see the block-chain and receipts? I don't believe they need to be able to see full history of each account, perhaps im wrong though. Nodes run synchronous and check the current state of current variables. My understanding is by having the full state you can run any and all transactions and therefore collect transaction gas fees etc... with a partial database you could not run all transactions broadcast... but you could still run a lot. Having said that, I reckon it would be feasible to write a node that selectively dumps massive chunks of old unused state performing some kind of opinionated and largely negative analysis of the chance a future transaction ever happening. Cos there must be a fair amount of crunk junk data in there. Plus way too much use of int256 means a lot of 0x000000 in front of yer digits.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
F3-annoyance 💩 The client behaves within expectations, however this “expected behaviour” itself is at issue. F7-footprint 🐾 An enhancement to provide a smaller (system load, memory, network or disk) footprint. M4-core ⛓ Core client code / Rust. P0-dropeverything 🌋 Everyone should address the issue now. Q7-involved 💪 Can be fixed by a team of developers and probably takes some time.
Projects
None yet