Ancient block sync stalls #7008

ethernian · 2017-11-09T02:57:23Z

I'm running:

Parity version: Parity/v1.8.2-beta-1b6588c-20171025/x86_64-linux-gnu/rustc1.21.0

Operating system: Linux

And installed: sudo apt-get install ./parity_1.8.2_amd64.deb

configuration detail: chain is simlinked to another drive

human@EtherBox:~$ ls -lisa ./.local/share/io.parity.ethereum/chains
524382 0 lrwxrwxrwx 1 human human 28 Sep  3 21:25 ./.local/share/io.parity.ethereum/chains -> /media/human/CHAIN_DB/chains

After warp sync, I'm unable to get all missing blocks fetched. After new start Parity forgets the already fetched blocks and start downloading them again and again.

Here is a sample of 3 parity runs

Here is the log file from above (as text file)
parity-3x-sync.txt

The text was updated successfully, but these errors were encountered:

roninkaizen · 2017-11-10T08:35:00Z

normal behaviour/ misunderstanding-
as you looked at the screen you posted, it became obvious-
confirmed = yes
unwanted, unintended=no
normal behavior, other counting on the transfers,
also having that, being fully sync on that nodes,
irritiating=yes
annoyance=no

would please somebody of parity document(declare) this behavior,
then it is known to be normal.
thx

5chdn · 2017-11-10T15:06:09Z

That's the ancient block download.

Warp-sync downloads the latest snapshot and the last 30k blocks. After that, it starts downloading the full blockchain (yellow numbers).

ethernian · 2017-11-10T19:02:47Z

@5chdn @roninkaizen
What is the reason to download ancient blocks again and again? This is the issue reported.

Please check the block numbers:
1st run: downloaded ancient blocks from #3574411 to #3614671
2st run: downloaded ancient blocks from #3573015 to #3576698
3st run: downloaded ancient blocks from #3573269 to #3575935

ancient blocks being downloaded in the same range again und again.
What is the reason to do so, if it is not a bug?

arkpar · 2017-11-11T17:57:19Z

This is really weird indeed. Never seen this before. Could you restart with -l sync=trace and post logs?

ethernian · 2017-11-11T20:34:29Z

yes, please:
Here 4 subsequent runs, two last are with -l sync=trace.

Trace-Logs:
parity-trace.log.zip

5chdn · 2017-11-14T10:31:51Z

Having similar issues with 1.8.2

5chdn · 2018-01-03T12:11:34Z

Can we also make sure we do not purge ancient blocks whenever a warpsync kicks in?
Edit: #6350

tomusdrw · 2018-01-05T10:38:15Z

As a workaround:

Remove database
Warp again to latest snapshot (hopefuly warp health improves soon with new bootnodes being rolled out)
Download ancient blocks again.

Can we also make sure we do not purge ancient blocks whenever a warpsync kicks in?

@5chdn That's a separate issue, can you log it?

5chdn · 2018-01-05T10:58:48Z

@tomusdrw I think it is this one: #6350

lght · 2018-01-11T23:10:09Z

As a workaround:

Remove database
Warp again to latest snapshot (hopefuly warp health improves soon with new bootnodes being rolled out)
Download ancient blocks again.

👍 for this workaround

With a 1.10.0 nightly build, warp took only ~40min from a fresh db!

Peer connections are still really volatile, just dropped to ~10 peers. Got to a max of 25 peers, but spending most time with 1-5 peers.

Will report back with sync status, currently at block 4880220. Warp synced to block 4880000

Update: confirmed, fully synced from scratch using warp and fast compaction!! Took ~30hrs from start-to-fully-synced.

lght · 2018-01-13T01:07:24Z

@arkpar is it possible that part of @dip239's issue is because of some corrupt block in the import round he was working on causing all blocks in that round to be rolled back on restart?

5chdn · 2018-01-16T10:50:54Z

lght, this issue is about ancient block sync that happens after the warp sync.

GoodMirek · 2018-01-18T11:34:05Z

The issue Ancient block sync stalls has happened to me two times so far. Parity and Linux version strings follow:

version Parity/v1.10.0-unstable-25b19835e-20180117/x86_64-linux-gnu/rustc1.22.1

Linux wedos1 4.14.13-300.fc27.x86_64 #1 SMP Thu Jan 11 04:00:01 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

Parity has been started with command:

parity daemon /home/ether/parity.pid --log-file /home/ether/parity.log -l info --cache-size 4096 --max-peers 10 --min-peers 5

It seems to happen once per ~1.5M blocks (1st occurrence at block 1639023, 2nd occurrence at block 2996831). At first occurrence, after the graceful restart (SIGHUP) the parity process continued to download ancient blocks from where it left off. There were parity process restarts in between the occurrences.

Second occurrence of the issue just happened and I am keeping the process running at the stalled state. Blockchain head keeps syncing. If you need any further info from the stalled process or need access to the system, let me know (there is no private info on the system). If I did not hear back within 3 days, I planned to restart the process with TRACE loglevel. I expect the issue could happen again before all blocks are downloaded.

While being in stale state, I took coredump of the process using gcore. Attached is gdb backtrace gdb_core_9187.txt. Log is at info level, so probably not useful. If you need the coredump file, I will share it (it contains no private info).

If that helps, I can rebuild parity in a different way, e.g. with symbols and without optimizations. I would appreciate a hint how to do that as I have zero rust knowledge.

5chdn · 2018-04-05T14:30:10Z

This just happened to one of my fresh 1.10.0 nodes if anyone wants to debug this.

folsen · 2018-05-20T04:35:47Z

@ngotchac can you double-check that #8642 addresses this and if so close the issue please.

GoodMirek · 2018-05-20T07:24:22Z

I have tried twice with yesterday's commit 6552256.
The result was that it has never finished syncing the snapshot in one case (OpenVZ container) or it got stalled after syncing the snapshot successfully, but just before getting in sync with network, during processing of last blocks in the queue (KVM VM).

It might be attributed to compilation with rustc 1.26. Plus it seems there is a different memory allocation behavior while compiled with rustc 1.26 instead of 1.25. Either there is a memory leak or it cannot handle the situation of running in an OpenVZ container compared to running in KVM VM. Though running in OpenVZ container is probably non-interesting use case (I am using it because it comes with a good price), maybe it can help to know that the parity process in containerized node always dies after some time, what is not the case while compiled with 1.25. OpenVZ container reports correctly total amount of memory available, but not sure whether it is able to indicate memory allocation failure via ENOMEM under memory pressure or just invokes OOM killer.
I am running parity with this command:

~/parity/target/release/parity daemon ~/parity.pid --log-file ~/parity.log -l info --cache-size 2048 --cache-size-state 1024 --max-peers 200 --min-peers 25

in a container with 4GB RAM. Setting cache-size-state to at least 512 makes huge difference in performance in my case.

I am sorry I could not spend more time as this is just a hobby. Currently, I am trying to run both my nodes with vanilla 1.10.4 stable release binaries compiled with 1.25, downloaded from github.

folsen · 2018-05-20T08:46:14Z

@GoodMirek Thanks so much for this, it's really helpful, however I don't think it's related to this issue. I'd also say that 4GB RAM is probably not enough to run Parity with those caches, ram usage can depend a bit on peers as well although not a ton. My own node consumes about 10gb ram with similar cache sizes, though that is still a bit above expectation for various reasons. It would be a separate issue to try to investigate running parity under OpenVZ vs KVM VM, it should work, but I personally don't know the requirements on the VM side.

GoodMirek · 2018-05-20T08:57:54Z

@folsen The point is that on the same container with the same command line options a previous version built with rustc 1.25 ran just fine for more than a week. Though I do not remember which commit hash it was, but not more than a month old.
I know the previous build I ran suffered from this particular issue "Ancient block sync stalls", as I had to restart the process several times to complete download of old blocks.

folsen · 2018-05-21T02:01:33Z

@GoodMirek Interesting, please open a separate issue for the difference between 1.25 and 1.26, we definitely don't want regressions between rust versions.

Office-Julia added the Z0-unconfirmed 🤔 Issue might be valid, but it’s not yet known. label Nov 9, 2017

5chdn added M4-core ⛓ Core client code / Rust. Z1-question 🙋‍♀️ Issue is a question. Closer should answer. and removed Z0-unconfirmed 🤔 Issue might be valid, but it’s not yet known. labels Nov 10, 2017

5chdn closed this as completed Nov 10, 2017

arkpar reopened this Nov 11, 2017

5chdn added this to the 1.9 milestone Nov 13, 2017

5chdn changed the title ~~Warp sync: after restart parity forgets already fetched blocks~~ Ancient block sync stalls Nov 14, 2017

5chdn mentioned this issue Nov 16, 2017

Release next-beta 1.9.0 #7071

Closed

64 tasks

5chdn modified the milestone: 1.9 Dec 6, 2017

5chdn added P0-dropeverything 🌋 Everyone should address the issue now. and removed P2-asap 🌊 No need to stop dead in your tracks, however issue should be addressed as soon as possible. labels Jan 3, 2018

tomusdrw added P2-asap 🌊 No need to stop dead in your tracks, however issue should be addressed as soon as possible. and removed P0-dropeverything 🌋 Everyone should address the issue now. labels Jan 5, 2018

lght self-assigned this Jan 9, 2018

lght mentioned this issue Jan 11, 2018

Inability to sync HDD to mainnet #7530

Closed

5chdn modified the milestones: 1.9, 1.10 Jan 23, 2018

5chdn mentioned this issue Jan 26, 2018

Release next-beta 1.10 #7699

Closed

46 tasks

5chdn modified the milestones: 1.10, 1.11 Mar 1, 2018

5chdn unassigned lght Mar 1, 2018

5chdn modified the milestones: 1.11, 1.12 Apr 24, 2018

5chdn mentioned this issue May 16, 2018

Fix ancient blocks not downloading #8642

Merged

5chdn modified the milestones: 2.0, 2.1 Jul 17, 2018

andresilva mentioned this issue Aug 23, 2018

ethcore: fix ancient block sync #9407

Closed

5chdn modified the milestones: 2.1, 2.2 Sep 11, 2018

ascjones mentioned this issue Sep 11, 2018

Fix ancient blocks sync #9531

Merged

5chdn closed this as completed in #9531 Oct 9, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ancient block sync stalls #7008

Ancient block sync stalls #7008

ethernian commented Nov 9, 2017

roninkaizen commented Nov 10, 2017 •

edited

Loading

5chdn commented Nov 10, 2017

ethernian commented Nov 10, 2017 •

edited

Loading

arkpar commented Nov 11, 2017

ethernian commented Nov 11, 2017

5chdn commented Nov 14, 2017

5chdn commented Jan 3, 2018 •

edited

Loading

tomusdrw commented Jan 5, 2018

5chdn commented Jan 5, 2018

lght commented Jan 11, 2018 •

edited

Loading

lght commented Jan 13, 2018

5chdn commented Jan 16, 2018

GoodMirek commented Jan 18, 2018 •

edited

Loading

5chdn commented Apr 5, 2018

folsen commented May 20, 2018

GoodMirek commented May 20, 2018 •

edited

Loading

folsen commented May 20, 2018

GoodMirek commented May 20, 2018

folsen commented May 21, 2018

Ancient block sync stalls #7008

Ancient block sync stalls #7008

Comments

ethernian commented Nov 9, 2017

roninkaizen commented Nov 10, 2017 • edited Loading

5chdn commented Nov 10, 2017

ethernian commented Nov 10, 2017 • edited Loading

arkpar commented Nov 11, 2017

ethernian commented Nov 11, 2017

5chdn commented Nov 14, 2017

5chdn commented Jan 3, 2018 • edited Loading

tomusdrw commented Jan 5, 2018

5chdn commented Jan 5, 2018

lght commented Jan 11, 2018 • edited Loading

lght commented Jan 13, 2018

5chdn commented Jan 16, 2018

GoodMirek commented Jan 18, 2018 • edited Loading

5chdn commented Apr 5, 2018

folsen commented May 20, 2018

GoodMirek commented May 20, 2018 • edited Loading

folsen commented May 20, 2018

GoodMirek commented May 20, 2018

folsen commented May 21, 2018

roninkaizen commented Nov 10, 2017 •

edited

Loading

ethernian commented Nov 10, 2017 •

edited

Loading

5chdn commented Jan 3, 2018 •

edited

Loading

lght commented Jan 11, 2018 •

edited

Loading

GoodMirek commented Jan 18, 2018 •

edited

Loading

GoodMirek commented May 20, 2018 •

edited

Loading