Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validator missed blocks count is incorrect after decreasing slashing window #12541

Open
4 tasks
mintthemoon opened this issue Jul 12, 2022 · 13 comments
Open
4 tasks

Comments

@mintthemoon
Copy link

Summary of Bug

After lowering the slashing window from 35000 to 18000 on the Kujira blockchain some validators are stuck with a missed block count that won't clear even if they don't miss any more blocks. Querying all blocks in the signing window for some validators (e.g. kujiravaloper1pshqems6hdka48gc56r2ykshyaarkt40hl0rlh) shows no misses while kujirad query slashing signing-info reports a positive count.

I believe the issue is here:

signInfo, found := k.GetValidatorSigningInfo(ctx, consAddr)
if !found {
panic(fmt.Sprintf("Expected signing info for validator %s but not found", consAddr))
}
// this is a relative index, so it counts blocks the validator *should* have signed
// will use the 0-value default signing info if not present, except for start height
index := signInfo.IndexOffset % k.SignedBlocksWindow(ctx)
signInfo.IndexOffset++
// Update signed block bit array & counter
// This counter just tracks the sum of the bit array
// That way we avoid needing to read/write the whole array each time
previous := k.GetValidatorMissedBlockBitArray(ctx, consAddr, index)
missed := !signed
switch {
case !previous && missed:
// Array value has changed from not missed to missed, increment counter
k.SetValidatorMissedBlockBitArray(ctx, consAddr, index, true)
signInfo.MissedBlocksCounter++
case previous && !missed:
// Array value has changed from missed to not missed, decrement counter
k.SetValidatorMissedBlockBitArray(ctx, consAddr, index, false)
signInfo.MissedBlocksCounter--
default:
// Array value at this index has not changed, no need to update counter
}

This code updates the missed block count by checking the current value against the previous value at that index. The problem is any blocks missed at indexes above 18000 in the bit array are "lost" when the window changes because they'll contribute to the count but never be checked again. That explains the behavior we're seeing where multiple validators have miss counts that won't go below where they were on 7/3 when the parameter change took effect.

Version

v0.45.4

Steps to Reproduce

Cause a validator to miss blocks distributed throughout the slashing window and then lower the slashing window parameter for the chain. Compare manual missed block counts from checking signatures against what's reported by query slashing signing-info.


For Admin Use

  • Not duplicate issue
  • Appropriate labels applied
  • Appropriate contributors tagged
  • Contributor assigned/self-assigned
@alexanderbez
Copy link
Contributor

alexanderbez commented Jul 13, 2022

Yeah this is a problem due to "stateless"-parameter changes, i.e. params change w/o having any possible desired side affects. Unfortunately, I don't think there's much you can do in this case.

This is why we're deprecating x/params if favor of module-managed parameters, see: #10514

In this particular case, you sort of have three options:

  • Adjust/fix these windows in an upgrade
  • Fork x/slashing and take the approach we're taking in the SDK (if you don't want to wait for the next release)
  • Wait till the next release :p

@mintthemoon
Copy link
Author

Good to hear it's on the radar already! The new direction makes sense but I'm not quite following the implementation details. Is ParamChangeProposal being deprecated and if yes how would the slashing window be changed under the new model?

Waiting for the release probably makes sense for us (I'd love to contribute if there's anything a relative beginner could tackle). If there's a simple way to reset the current miss counts to 0 so they'd fill in properly from then on that would be ideal, but we at least can do manual calculations for missed blocks in the explorer as a workaround.

@alexanderbez
Copy link
Contributor

alexanderbez commented Jul 13, 2022

Is ParamChangeProposal being deprecated and if yes how would the slashing window be changed under the new model?

Yes and here. This is where we can perform all sorts of side effects as we have access to the entire keeper. Note, we currently dont have any logic, here but we need to!

If there's a simple way to reset the current miss counts to 0 so they'd fill in properly from then on that would be ideal, but we at least can do manual calculations for missed blocks in the explorer as a workaround.

I think you have to perform an upgrade to fix them :/

@mintthemoon
Copy link
Author

Yes and here. This is where we can perform all sorts of side effects as we have access to the entire keeper. Note, we currently dont have any logic, here but we need to!

Cool, I spotted that function earlier and it makes sense to put the side effects there. How does that function get called via governance though?

I think you have to perform an upgrade to fix them :/

That's unfortunate but I'm sure we'll figure it out, appreciate your help.

@alexanderbez
Copy link
Contributor

alexanderbez commented Jul 13, 2022

How does that function get called via governance though?

Governance just executes a series of one or more sdk.Msg types. So when you submit a proposal with a x/slashing MsgUpdateParams msg, that method gets executed.

The issue still stands though -- we need to figure out what kind of logic we need to perform when updating these params.

@mintthemoon
Copy link
Author

Governance just executes a series of one or more sdk.Msg types. So when you submit a proposal with a x/slashing MsgUpdateParams msg, that method gets executed.

Got it, thanks!

The issue still stands though -- we need to figure out what kind of logic we need to perform when updating these params.

Yeah, so in this case (the slashing window) I see two options: we can either set all missed counts to 0 at update time and allow a slight blip in the data, or else we need to recalculate the totals by checking every block in the window. I'm not sure what the performance requirements are for these updates but I'm guessing the second option might be a bit heavy.

@alexanderbez
Copy link
Contributor

Yeah true, but this only gets executed on proposal updates as a proposal, so a slight performance hit isn't too much of a concern.

@tac0turtle
Copy link
Member

now that params have migrated to modules we can tackle this issue.

@alexanderbez
Copy link
Contributor

Yes, the UpdateParams method on x/slashing will have to "migrate" all the signing bitmaps. This could be costly, but I don't see a way around it. The algo would be along the lines of:

  • For each validator/signing-info, get the missed blocks
  • For each validator/signing-info, clear the entire bitmap
  • For each validator/signing-info, set the new bitmap using the new signing window

@tac0turtle
Copy link
Member

tac0turtle commented May 5, 2023

by default i was thinking we could reset, we dont want the case of there are 5 validators who have 10k missed blocks and now the window is 5k and they get slashed right away. If you are fine with this i can make the change

@alexanderbez
Copy link
Contributor

In my personal opinion, such a change on a network means that the protocol is becoming more stringent with liveness guarantees, and thus I believe that validator should be slashed. However, that's just my opinion. Networks might see it differently. Also, what do you do in the case when the window increases? In that case, I'm even more firm on not reseting.

@tac0turtle
Copy link
Member

Yes, the UpdateParams method on x/slashing will have to "migrate" all the signing bitmaps. This could be costly, but I don't see a way around it. The algo would be along the lines of:

  • For each validator/signing-info, get the missed blocks
  • For each validator/signing-info, clear the entire bitmap
  • For each validator/signing-info, set the new bitmap using the new signing window

why would we need to get, clear and set? couldnt we iterate and clear anything past a certain window?

@alexanderbez
Copy link
Contributor

You need to resize the bitmap, requiring you to get all the validator's missed blocks.

If we go with the simple approach of just resetting, then the migration is dead simple -- just delete!

@tac0turtle tac0turtle removed this from Cosmos-SDK Aug 18, 2023
@github-project-automation github-project-automation bot moved this to 👀 To Do in Cosmos-SDK Nov 16, 2023
@tac0turtle tac0turtle moved this from 👀 To Do to 🧑‍🔧 Needs Design in Cosmos-SDK Nov 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: ☃️ Icebox
Development

No branches or pull requests

4 participants
@alexanderbez @tac0turtle @mintthemoon and others