Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Race in buildServerState #3

Closed
dnephin opened this issue Dec 22, 2020 · 2 comments · Fixed by #4
Closed

Data Race in buildServerState #3

dnephin opened this issue Dec 22, 2020 · 2 comments · Fixed by #4
Labels
bug Something isn't working

Comments

@dnephin
Copy link
Contributor

dnephin commented Dec 22, 2020

This comment seems to indicate that the assumption is that we are not modifying shared state outside of a lock, however the race detector (while running on the Consul test suite) seems to indicate that this field is not actually a copy. This write is racing with another goroutine that is trying to read the address.

WARNING: DATA RACE
Read at 0x00c00086e290 by goroutine 529:
  github.com/hashicorp/raft-autopilot.(*Autopilot).applyPromotions()
      /home/daniel/go/pkg/mod/github.com/hashicorp/[email protected]/reconcile.go:94 +0x5ac
  github.com/hashicorp/raft-autopilot.(*Autopilot).reconcile()
      /home/daniel/go/pkg/mod/github.com/hashicorp/[email protected]/reconcile.go:32 +0x2ab
  github.com/hashicorp/raft-autopilot.(*Autopilot).run()
      /home/daniel/go/pkg/mod/github.com/hashicorp/[email protected]/run.go:99 +0x4f5

Previous write at 0x00c00086e290 by goroutine 202:
  github.com/hashicorp/raft-autopilot.buildServerState()
      /home/daniel/go/pkg/mod/github.com/hashicorp/[email protected]/state.go:308 +0x95c
  github.com/hashicorp/raft-autopilot.(*Autopilot).nextServers()
      /home/daniel/go/pkg/mod/github.com/hashicorp/[email protected]/state.go:242 +0x1c8
  github.com/hashicorp/raft-autopilot.(*Autopilot).nextStateWithInputs()
      /home/daniel/go/pkg/mod/github.com/hashicorp/[email protected]/state.go:164 +0x64
  github.com/hashicorp/raft-autopilot.(*Autopilot).nextState()
      /home/daniel/go/pkg/mod/github.com/hashicorp/[email protected]/state.go:155 +0x7e
  github.com/hashicorp/raft-autopilot.(*Autopilot).updateState()
      /home/daniel/go/pkg/mod/github.com/hashicorp/[email protected]/state.go:349 +0x6f
  github.com/hashicorp/raft-autopilot.(*Autopilot).runStateUpdater()
      /home/daniel/go/pkg/mod/github.com/hashicorp/[email protected]/run.go:131 +0x185

Goroutine 529 (running) created at:
  github.com/hashicorp/raft-autopilot.(*Autopilot).Start()
      /home/daniel/go/pkg/mod/github.com/hashicorp/[email protected]/run.go:34 +0x34e
  github.com/hashicorp/consul/agent/consul.(*Server).establishLeadership()
      /home/daniel/pers/code/consul/agent/consul/leader.go:344 +0x244
  github.com/hashicorp/consul/agent/consul.(*Server).leaderLoop()
      /home/daniel/pers/code/consul/agent/consul/leader.go:199 +0x10af
  github.com/hashicorp/consul/agent/consul.(*Server).monitorLeadership.func1()
      /home/daniel/pers/code/consul/agent/consul/leader.go:91 +0x79

Goroutine 202 (running) created at:
  github.com/hashicorp/raft-autopilot.(*Autopilot).run()
      /home/daniel/go/pkg/mod/github.com/hashicorp/[email protected]/run.go:72 +0x104
@dnephin dnephin added the bug Something isn't working label Dec 22, 2020
@dnephin
Copy link
Contributor Author

dnephin commented Dec 22, 2020

This is a similar problem, on another field

Write at 0x00c000de2da0 by goroutine 550:
  github.com/hashicorp/raft-autopilot.buildServerState()
      /home/daniel/go/pkg/mod/github.com/hashicorp/[email protected]/state.go:320 +0x7ec
  github.com/hashicorp/raft-autopilot.(*Autopilot).nextServers()
      /home/daniel/go/pkg/mod/github.com/hashicorp/[email protected]/state.go:242 +0x1c8
  github.com/hashicorp/raft-autopilot.(*Autopilot).nextStateWithInputs()
      /home/daniel/go/pkg/mod/github.com/hashicorp/[email protected]/state.go:164 +0x64
  github.com/hashicorp/raft-autopilot.(*Autopilot).nextState()
      /home/daniel/go/pkg/mod/github.com/hashicorp/[email protected]/state.go:155 +0x7e
  github.com/hashicorp/raft-autopilot.(*Autopilot).updateState()
      /home/daniel/go/pkg/mod/github.com/hashicorp/[email protected]/state.go:349 +0x6f
  github.com/hashicorp/raft-autopilot.(*Autopilot).Start()
      /home/daniel/go/pkg/mod/github.com/hashicorp/[email protected]/run.go:32 +0x30f
  github.com/hashicorp/consul/agent/consul.(*Server).establishLeadership()
      /home/daniel/pers/code/consul/agent/consul/leader.go:344 +0x244
  github.com/hashicorp/consul/agent/consul.(*Server).leaderLoop()
      /home/daniel/pers/code/consul/agent/consul/leader.go:199 +0x10af
  github.com/hashicorp/consul/agent/consul.(*Server).monitorLeadership.func1()
      /home/daniel/pers/code/consul/agent/consul/leader.go:91 +0x79

Previous read at 0x00c000de2da0 by goroutine 748:
  github.com/hashicorp/consul/agent/consul.(*StatsFetcher).fetch()
      /home/daniel/pers/code/consul/agent/consul/stats_fetcher.go:56 +0x110

Goroutine 550 (running) created at:
  github.com/hashicorp/consul/agent/consul.(*Server).monitorLeadership()
      /home/daniel/pers/code/consul/agent/consul/leader.go:89 +0x38b

Goroutine 748 (running) created at:
  github.com/hashicorp/consul/agent/consul.(*StatsFetcher).Fetch()
      /home/daniel/pers/code/consul/agent/consul/stats_fetcher.go:99 +0x5de
  github.com/hashicorp/consul/agent/consul.(*AutopilotDelegate).FetchServerStats()
      /home/daniel/pers/code/consul/agent/consul/autopilot.go:41 +0x90
  github.com/hashicorp/raft-autopilot.(*Autopilot).gatherNextStateInputs()
      /home/daniel/go/pkg/mod/github.com/hashicorp/[email protected]/state.go:139 +0xae1
  github.com/hashicorp/raft-autopilot.(*Autopilot).nextState()
      /home/daniel/go/pkg/mod/github.com/hashicorp/[email protected]/state.go:150 +0x53
  github.com/hashicorp/raft-autopilot.(*Autopilot).updateState()
      /home/daniel/go/pkg/mod/github.com/hashicorp/[email protected]/state.go:349 +0x6f
  github.com/hashicorp/raft-autopilot.(*Autopilot).Start()
      /home/daniel/go/pkg/mod/github.com/hashicorp/[email protected]/run.go:32 +0x30f
  github.com/hashicorp/consul/agent/consul.(*Server).establishLeadership()
      /home/daniel/pers/code/consul/agent/consul/leader.go:344 +0x244
  github.com/hashicorp/consul/agent/consul.(*Server).leaderLoop()
      /home/daniel/pers/code/consul/agent/consul/leader.go:199 +0x10af
  github.com/hashicorp/consul/agent/consul.(*Server).monitorLeadership.func1()
      /home/daniel/pers/code/consul/agent/consul/leader.go:91 +0x79
==================

@dnephin
Copy link
Contributor Author

dnephin commented Dec 22, 2020

I think the problem is that nextStateInputs.State is being set to the current state in gatherNextStateInputs, but the read lock is released as soon as that function exits. Later on in that same goroutine buildServerState is iterating over and writing to the list of servers in nextStateInputs.State (which is a pointer to the same state) without holding any lock! So when the reconcile goroutine attempts to use the state holding a lock, it races.

The second one I'm not sure about yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant