Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Fix slashing period fetch #2305

Closed
wants to merge 3 commits into from

Conversation

cwgoes
Copy link
Contributor

@cwgoes cwgoes commented Sep 11, 2018

Closes #2224

  • Targeted PR against correct branch (see CONTRIBUTING.md)

  • Linked to github-issue with discussion and accepted design OR link to spec that describes this work.

  • Wrote tests

  • Updated relevant documentation (docs/)

  • Added entries in PENDING.md with issue #

  • rereviewed Files changed in the github PR explorer


For Admin Use:

  • Added appropriate labels to PR (ex. wip, ready-for-review, docs)
  • Reviewers Assigned
  • Squashed all commits, uses message "Merge pull request #XYZ: [title]" (coding standards)

@cwgoes
Copy link
Contributor Author

cwgoes commented Sep 11, 2018

cc @alexanderbez summarizing in case you want to pick this up:

Run make test_sim_gaia_fast | tee sim.out to reproduce the failure and dump a bunch of printed logs. This is clearly an iteration issue - looking at the final failure, the validator clearly has several slashing periods, yet somehow the ReverseIterator finds none of them.

Looking at types/store.go, the documentation for ReverseIterator is confusing - line 144 indicates that start must be greater than end, yet we pass on line 176 a start that is clearly less than end. I don't know if that's the problem (then how would the KVStoreReversePrefixIterator work?) but we should figure out the conflict, and then debug why the ReverseIterator isn't finding the slashing period.

Also, I tried using store/firstlast.go - but that didn't work either (same iteration failure). Pushed that to https://github.com/cosmos/cosmos-sdk/tree/cwgoes/fix-slashing-period-fetch-firstlast for reference.

I think ideally we should:

  1. Debug the iteration problem.
  2. Fix store/firstlast.go (or fix our usage of it).
  3. Move store/firstlast.go into types/ or alias it (hard to find)
  4. Update the slashing period logic (and similar) to use the common functions, I think we use an iterator like this in a few places

@alexanderbez
Copy link
Contributor

@cwgoes I will take a look at this and try to resolve any outstanding issues 👍

@codecov
Copy link

codecov bot commented Sep 12, 2018

Codecov Report

❗ No coverage uploaded for pull request base (develop@e5e7c4f). Click here to learn what that means.
The diff coverage is 41.66%.

@@            Coverage Diff             @@
##             develop    #2305   +/-   ##
==========================================
  Coverage           ?   63.87%           
==========================================
  Files              ?      140           
  Lines              ?     8711           
  Branches           ?        0           
==========================================
  Hits               ?     5564           
  Misses             ?     2771           
  Partials           ?      376

@cwgoes
Copy link
Contributor Author

cwgoes commented Sep 12, 2018

Merged develop, now fails (quickly) on block 281 with a slightly different error:

Panic with err
 slashing period ended before infraction: validator cosmosvalcons1dku9qrrg8qza0at3kqk7sq7076hugjyk7ugyag, infraction height 281, slashing period ended at 6
goroutine 13 [running]:
runtime/debug.Stack(0xc00469a420, 0x2, 0x2)
	/usr/lib/go/src/runtime/debug/stack.go:24 +0xa7
github.com/cosmos/cosmos-sdk/x/mock/simulation.SimulateFromSeed.func3(0xc000d1fc20, 0xc000106a00, 0xc00091cad0)
	/home/cwgoes/working/go/src/github.com/cosmos/cosmos-sdk/x/mock/simulation/random_simulate_blocks.go:113 +0xb6
panic(0xd2c6e0, 0xc00167ee70)
	/usr/lib/go/src/runtime/panic.go:513 +0x1b9
github.com/cosmos/cosmos-sdk/x/slashing.Keeper.capBySlashingPeriod(0x10c9bc0, 0xc00091c9d0, 0xc000914770, 0x10d7c40, 0xc00007fc00, 0xc000914770, 0x10c9bc0, 0xc00091ca00, 0xa, 0x10d12a0, ...)
	/home/cwgoes/working/go/src/github.com/cosmos/cosmos-sdk/x/slashing/slashing_period.go:19 +0x3c8
github.com/cosmos/cosmos-sdk/x/slashing.Keeper.handleDoubleSign(0x10c9bc0, 0xc00091c9d0, 0xc000914770, 0x10d7c40, 0xc00007fc00, 0xc000914770, 0x10c9bc0, 0xc00091ca00, 0xa, 0x10d12a0, ...)
	/home/cwgoes/working/go/src/github.com/cosmos/cosmos-sdk/x/slashing/keeper.go:62 +0x64c
github.com/cosmos/cosmos-sdk/x/slashing.BeginBlocker(0x10d12a0, 0xc001a65470, 0xc0000b7280, 0x23c, 0x0, 0x0, 0x0, 0x0, 0x0, 0x119, ...)
	/home/cwgoes/working/go/src/github.com/cosmos/cosmos-sdk/x/slashing/tick.go:34 +0x6cb
github.com/cosmos/cosmos-sdk/cmd/gaia/app.(*GaiaApp).BeginBlocker(0xc0007e2780, 0x10d12a0, 0xc001a65470, 0xc0000b7280, 0x23c, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/home/cwgoes/working/go/src/github.com/cosmos/cosmos-sdk/cmd/gaia/app/app.go:143 +0x104
github.com/cosmos/cosmos-sdk/cmd/gaia/app.(*GaiaApp).BeginBlocker-fm(0x10d12a0, 0xc001a65470, 0xc0000b7280, 0x23c, 0x0, 0x0, 0x0, 0x0, 0x0, 0x119, ...)
	/home/cwgoes/working/go/src/github.com/cosmos/cosmos-sdk/cmd/gaia/app/app.go:114 +0xba
github.com/cosmos/cosmos-sdk/baseapp.(*BaseApp).BeginBlock(0xc0009321a0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x119, 0x0, 0xc00f287c36, 0x181c880, ...)
	/home/cwgoes/working/go/src/github.com/cosmos/cosmos-sdk/baseapp/baseapp.go:419 +0x230
github.com/cosmos/cosmos-sdk/x/mock/simulation.SimulateFromSeed(0x10db6a0, 0xc0004c7100, 0xc0009321a0, 0xffeb50, 0x2a, 0xc0000cb600, 0xb, 0xb, 0xc00092bef8, 0x0, ...)
	/home/cwgoes/working/go/src/github.com/cosmos/cosmos-sdk/x/mock/simulation/random_simulate_blocks.go:127 +0xc58
github.com/cosmos/cosmos-sdk/cmd/gaia/app.TestFullGaiaSimulation(0xc0004c7100)
	/home/cwgoes/working/go/src/github.com/cosmos/cosmos-sdk/cmd/gaia/app/sim_test.go:171 +0x2a9
testing.tRunner(0xc0004c7100, 0xffeb20)
	/usr/lib/go/src/testing/testing.go:827 +0xbf
created by testing.(*T).Run
	/usr/lib/go/src/testing/testing.go:878 +0x353

We may have to debug that first, although both may be caused by the same underlying incorrect iteration issue.

@alexanderbez
Copy link
Contributor

alexanderbez commented Sep 12, 2018

Yeah, it seems in this case this particular validator was never re-bonded after height 6 and as a result, a new slashing period was never created. Digging further...

Update: Somewhat confused.

  • At height 6 the validator was sent to TM at end block with status unbonding and zero voting power, which should've removed it from the validator set.
  • I never see this validator sent to TM as bonded ever again.
  • However, I still see the validator's signature in blocks 7, 9, 10, ... (later a double sign at 281)

Iteration still remains...

@cwgoes
Copy link
Contributor Author

cwgoes commented Sep 13, 2018

However, I still see the validator's signature in blocks 7, 9, 10, ... (later a double sign at 281)

Strange - check the simulation validator update logic (the simulation doesn't actually run Tendermint)? Is the validator left in the map?

@cwgoes
Copy link
Contributor Author

cwgoes commented Oct 3, 2018

Closing in favor of #2430 (changes merged to that branch).

@cwgoes cwgoes closed this Oct 3, 2018
@cwgoes cwgoes deleted the cwgoes/fix-slashing-period-fetch branch October 3, 2018 14:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants