Skip to content
This repository has been archived by the owner on Jan 22, 2025. It is now read-only.

TdS consensus lost at slot 797572 #8189

Closed
mvines opened this issue Feb 10, 2020 · 6 comments
Closed

TdS consensus lost at slot 797572 #8189

mvines opened this issue Feb 10, 2020 · 6 comments
Milestone

Comments

@mvines
Copy link
Contributor

mvines commented Feb 10, 2020

The TdS cluster lost consensus over the weekend.

  • Last root slot: 797453
  • Metrics window. No unusual blockstore errors.
  • Looking at a ledger graph around time, it can be seen that the BSV forked left at 797572 and the rest of the stake forked right: s-798000.pdf
@mvines mvines added this to the Tofino v0.23.3 milestone Feb 10, 2020
@mvines
Copy link
Contributor Author

mvines commented Feb 10, 2020

I wonder if this is really just the same issue as #8174

@mvines
Copy link
Contributor Author

mvines commented Feb 10, 2020

Bootstrap validator logs during this time window are at https://drive.google.com/drive/folders/1vj6uZuB5Lwn_bfWlPPMgg6WrnJTmGCI3?usp=sharing

Look for this line:

Feb  8 20:06:33 bootstrap-validator solana-validator[4674]: [2020-02-08T20:06:33.170791867Z INFO  solana_core::replay_stage] new fork:797572 parent:797571 (leader) root:797453

@carllin
Copy link
Contributor

carllin commented Feb 11, 2020

@aeyakovenko, @mvines @sagar-solana

I think a clearer picture off what happens is presented here:
forks.pdf starting at slot 797780. There are two forks:

  1. Containing the bootstrap leader with id: 5D1fNXzvv5NjV1ysLjirC4WY92RNsVH18vjmcszZd8on

  2. All the other validators

These two forks run parallel to each other, seemingly without ever joining again

The reason for this seems to be the other validators are observing blocks from the bootstrap leader very late. A good example of this is slot 797891 on the bootstrap leader's fork.

This slot was frozen by the bootstrap leader here (search in logs above posted by @mvines)

[2020-02-08T20:10:32.118871763Z INFO  solana_runtime::bank] bank frozen: 797891 hash: J3Q93eahnu3LxChj3HBuabbtykAZCMi4UyMQVy33RdGJ accounts_delta: BankHash 6e8de403db81ca3107df96352b060f8fc069350d20db7c0e85464614b28949d4 signature_count: 0 last_blockhash: 2c2LF7P1Ujt4zperdQNjtmQCfortn1Kw7ww4YPh3HtnC

Meanwhile, in one of the validator's logs here: https://drive.google.com/open?id=1_O9LPuWWPwFwI_Yu6rLzGG_cDQ2jHDtV, it indicates they received the block almost 5 mins later

 [2020-02-08T20:15:43.830075155Z INFO  solana_runtime::bank] bank frozen: 797891 hash: J3Q93eahnu3LxChj3HBuabbtykAZCMi4UyMQVy33RdGJ accounts_delta: BankHash 6e8de403db81ca3107df96352b060f8fc069350d20db7c0e85464614b28949d4 signature_count: 0 last_blockhash: 2c2LF7P1Ujt4zperdQNjtmQCfortn1Kw7ww4YPh3HtnC

At the time this validator finally observed 797891 on fork #1, it also simultaneously observed a heavier block at 798418 on fork #2 (see below for the weight comparisons) so the validator doesn't switch:

On fork 1 at slot 797891

[2020-02-08T20:15:43.984559990Z WARN  solana_core::replay_stage] 2X5JSTLN9m2wm3ejCxfWRNMieuC2VMtaMWSoqLPbC4Pq slot_weight: 797891 5090111468178396653582698 1629253900651096942209625508730 797890

On fork 2 at slot 798418

[2020-02-08T20:15:43.986858761Z WARN  solana_core::replay_stage] 2X5JSTLN9m2wm3ejCxfWRNMieuC2VMtaMWSoqLPbC4Pq slot_weight: 798418 5090110939440567835796250 1629292802660799684221169001352 798417

@mvines
Copy link
Contributor Author

mvines commented Mar 2, 2020

@carllin - is there any action here still?

@carllin
Copy link
Contributor

carllin commented Mar 2, 2020

@mvines, yup, summarized by this issue: #8232

@mvines
Copy link
Contributor Author

mvines commented Mar 2, 2020

Obsoleted by #8232

@mvines mvines closed this as completed Mar 2, 2020
@mvines mvines modified the milestones: v0.23.9, v1.0.2 Mar 3, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants