Skip to content
This repository has been archived by the owner on Nov 6, 2020. It is now read-only.

Parity long hangs/stalls during syncing upon "Block queue full, pausing sync" (Windows) #6449

Closed
jaredohmni opened this issue Sep 3, 2017 · 3 comments
Labels
Z1-question 🙋‍♀️ Issue is a question. Closer should answer.

Comments

@jaredohmni
Copy link

I'm running:

  • Parity version: Parity/v1.7.0-beta-5f2cabd-20170727/x86_64-windows-msvc/rustc1.18.0
  • Operating system: Windows 10
  • And installed: via installer

I've been running parity in --pruning=archive mode trying to do a full sync and noticed extreme slowness that isn't related to the DDOS/attack blocks. Specifically, after getting to around block 3,000,000+, I noticed that parity would provide normal sync outputs for a few intervals (about 5-15s apart), then suddenly hang for around 5-10 minutes before the next report.

I've already read through many threads and determined that this doesn't look like standard transaction spam/Sweeper.sol stuff or ICO transaction bloat. For example, if I kill parity and start it again, it proceeds at full speed again for anywhere from 1000-10000 blocks and then runs into this state again.

I then ran with -lsync=trace to capture logs (attached). I noticed that the times when this behavior is triggered seem to be correlated with the "Block queue full, pausing sync" message:

017-09-03 12:31:41 IO Worker #2 TRACE sync Block queued a9579bc3853cf104f9c5d6cbbd6c2f492974b1816cee9b1a358f14b4fa4a2fd0
2017-09-03 12:31:41 IO Worker #2 TRACE sync Block queued a7bffd4fb70cc5c43f1f4a4aeffde9d40069dd36709ec90d60044d9dc446017e
2017-09-03 12:31:41 IO Worker #2 TRACE sync Block queued 58b3ea0ce4947ba9526c1670342cbba86d42041cae3fa24fff388d5abf2f00e1
2017-09-03 12:31:41 IO Worker #2 TRACE sync Block queued c25232022d55e56cab34d8b1c1d150be2454117bf5304e87324f3257eeb3c75f
2017-09-03 12:31:41 IO Worker #2 TRACE sync Block queued b482cef0d80071304bc859a1399035578307404978b6d6fbd95936017ab1c829
2017-09-03 12:31:41 IO Worker #2 TRACE sync Block queued 6dd285f701e5915e3d648eaf281e4a6a33d828aac3424621d6051b092d0053c0
2017-09-03 12:31:41 IO Worker #2 TRACE sync Imported 767 of 767
2017-09-03 12:31:41 IO Worker #2 TRACE sync Block queue full, pausing sync
2017-09-03 12:31:41 IO Worker #2 TRACE sync Syncing with peers: 11 active, 9 confirmed, 11 total
2017-09-03 12:31:41 IO Worker #2 TRACE sync Skipping busy peer 2
2017-09-03 12:31:41 IO Worker #2 TRACE sync Skipping busy peer 22
2017-09-03 12:31:41 IO Worker #2 TRACE sync Skipping busy peer 36
2017-09-03 12:31:41 IO Worker #2 TRACE sync Skipping busy peer 0
2017-09-03 12:31:41 IO Worker #2 TRACE sync Skipping busy peer 27
2017-09-03 12:31:41 IO Worker #2 TRACE sync Waiting for the block queue
2017-09-03 12:31:41 IO Worker #2 TRACE sync Skipping busy peer 5
2017-09-03 12:31:41 IO Worker #2 TRACE sync Skipping busy peer 42
2017-09-03 12:31:41 IO Worker #2 TRACE sync Waiting for the block queue
2017-09-03 12:31:41 IO Worker #2 TRACE sync 56 -> GetBlockHeaders (number: 3920498, max: 192, skip: 0, reverse:false)
2017-09-03 12:31:41 IO Worker #2 TRACE sync 56 -> GetBlockHeaders: returned 0 entries
2017-09-03 12:31:41 IO Worker #2 TRACE sync New peer 47 (protocol: 2, network: 1, difficulty: Some(108561418780672567371), latest:2c16…2c48, genesis:d4e5…8fa3, snapshot:Some(3570000))
2017-09-03 12:31:41 IO Worker #2 DEBUG sync Connected 47:Parity/v1.6.6-beta-8c6e3f3-20170411/x86_64-linux-gnu/rustc1.16.0
2017-09-03 12:31:41 IO Worker #2 TRACE sync 47 <- GetForkHeader: at 1920000
2017-09-03 12:31:41 IO Worker #1 TRACE sync 57: Confirmed peer
2017-09-03 12:31:41 IO Worker #1 TRACE sync Waiting for the block queue

It's unclear to me whether this is a symptom or the root cause, but it seems highly correlated. As a short term workaround I'm running with --cache-size-blocks=4096 and I'm not seeing these hangs anymore. Perhaps there's some signaling issue or IO spinning being triggered in this case? I notice that during these stuck cases the disk is also showing 90-100% utilization, mostly in reads, whereas during normal syncing I typically see this disk utilization mostly in writes.

-lsync=trace log attached for two runs showing this behavior:
parity-sync.log.txt
parity-sync2.log.txt


@jaredohmni
Copy link
Author

Update: actually, I am still seeing the same kinds of periodic hangs even with --cache-size-blocks=2048, so there may be something else going on here. Are there any other trace logs that I should enable to help pinpoint this?

I'm still seeing that killing parity and starting it up again seems to immediately make forward progress past where it was previously stuck, so it still seems like there's some weird state happening.

@5chdn
Copy link
Contributor

5chdn commented Sep 4, 2017

That's being worked on, please subscribe to #6280

The Ethereum blockchain utilization beyond block 3_000_000 is a problem for any full client currently.

Are you on SSD or HDD? Do you use the UI while this happenes?

@5chdn 5chdn added the Z1-question 🙋‍♀️ Issue is a question. Closer should answer. label Sep 4, 2017
@jaredohmni
Copy link
Author

Thank you - I'm closing this issue and subscribing to #6280 as I agree that's the likely the root cause. I've tested this on Windows 10 (with a mid-range SSD) and on a Linux box with a higher end NVMe SSD and both exhibit the same write amplification issues which get worse around 3_000_000+ as you mentioned. There's no UI usage and I'm running straight from command line in both cases without the UI or any RPC stuff running.

As a hack, I ended up writing a script to monitor disk usage and kill/restart parity just before running out of disk space. It's working well enough for now and I'll follow progress on #6280 for the actual fix.

Thanks for your hard work on such an important part of the Ethereum ecosystem! If you need any other data collected or patches tested for #6280, feel free to ping me and I'd be happy to help.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Z1-question 🙋‍♀️ Issue is a question. Closer should answer.
Projects
None yet
Development

No branches or pull requests

2 participants