Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race Condition in Network Package Causes Gossamer to Crash #2393

Closed
kishansagathiya opened this issue Mar 18, 2022 · 4 comments · Fixed by #2408
Closed

Race Condition in Network Package Causes Gossamer to Crash #2393

kishansagathiya opened this issue Mar 18, 2022 · 4 comments · Fixed by #2408

Comments

@kishansagathiya
Copy link
Contributor

Describe the bug

  • While running a devnet, one or more nodes shut down because of runtime: unknown pc error.

Expected Behavior

Current Behavior

Possible Solution

To Reproduce

Steps to reproduce the behaviour:

Log output

Specification

  • go version:
  • gossamer version:
  • gossamer commit tag:
  • gossamer commit hash:
  • operating system:
  • additional links:
@kishansagathiya kishansagathiya changed the title gossamer shuts down during devnet gossamer shuts down in a devnet Mar 18, 2022
@qdm12
Copy link
Contributor

qdm12 commented Mar 18, 2022

The problem seems to come from 5843324 as far as a few minutes running Gossamer was done for each commit from 86624cf down to a few commits. I'm still testing to verify this is right, and will make a fix PR for that commit.

@kishansagathiya
Copy link
Contributor Author

I had a feeling that it could be because of a race condition is get functions in dot/network/peersdata.go.

For example,

func (p *peersData) getMutex(peerID peer.ID) *sync.Mutex {
p.mutexesMu.RLock()
defer p.mutexesMu.RUnlock()
return p.mutexes[peerID]
}

I think differs runs before return statements. (https://www.digitalocean.com/community/tutorials/understanding-defer-in-go, https://medium.com/a-journey-with-go/go-how-does-defer-statement-work-1a9492689b6e)

So, we are locking and unlocking before the task interesting to us. We should do something like this

 func (p *peersData) getMutex(peerID peer.ID) *sync.Mutex {
        p.mutexesMu.RLock()
        defer p.mutexesMu.RUnlock()
-       return p.mutexes[peerID]
+
+       mtx := p.mutexes[peerID]
+       return mtx
 }

Same goes for getInboundHandshakeData, getOutboundHandshakeData.

I ran the network after fixing this, but I still saw the problem. But this does seem inappropriate and we should fix this.

@danforbes danforbes changed the title gossamer shuts down in a devnet Race Condition in Network Package Causes Gossamer to Crash Mar 20, 2022
@kishansagathiya
Copy link
Contributor Author

closed in #2393

@qdm12
Copy link
Contributor

qdm12 commented Apr 1, 2022

Fixed by #2408

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants