-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: datarace in the handshake #17
Conversation
Replaces #16 |
@JustinDrake I think this is a bit better solution. |
ebb6e89
to
d3f3590
Compare
os: | ||
- linux | ||
- osx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did we disable macos for a reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OSX is delaying builds on Travis so I disable it in repos that don't have code that might depend on the system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it.
if !l.rhandshake { | ||
l.rhlock.RLock() | ||
rhandshake := l.rhandshake | ||
l.rhlock.RUnlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check was intentionally racy to avoid atomic operations on the fast path. Go routines should erroneously conclude that the handshake hasn't happened at most once (and the synchronization inside the handshake functions should prevent double handshakes).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it might be only case when races are "good" but it screws over race test in all code that uses this. This means that we can't run race detector on big part of go-libp2p and go-ipfs code unless we tolerate false positives.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, there's got to be a way to tell go this is "fine" (hopefully without writing a C extension). I know I've seen code in the go standard library that does this...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, we can probably use internals/race
. However, we can probably get a way with using sync.Once
without losing much, if any, performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't use internal/race
internal packages are locked and can be used only used buy children of the package. In this case only buy stdlib.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to do it previously with Once but I run into some problems. One of them was that I had to implement AsyncDo
(simple) so not every time new goroutine is started when we want to run something once and then I got some lockup but I could not figure out what was causing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why even do it in a goroutine? We need to wait for it to finish anyways. Can't we just have a lazyInitialize
function and call once.Do(msm.lazyInitialize)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could have two Once's, one for the read handshake, one for the write handshake (where each handshake side would asynchronously fire off the other side of the handshake (in a once)).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, sorry for the massive delay...
|
||
l.rhlock.Lock() | ||
defer l.rhlock.Unlock() | ||
|
||
l.rhsync = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is racy. You can't check, unlock, then relock and expect the state to stay the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, I think I missed because I restructured this code twice while writing it reverting to simpler solution.
Let's figure out #17 (comment) first.
return l.werr | ||
} | ||
|
||
l.whlock.Lock() | ||
defer l.whlock.Unlock() | ||
|
||
l.whsync = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same race here.
whandshake := l.whandshake | ||
l.whlock.RUnlock() | ||
|
||
if !whandshake { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, intentionally a racy fast-path.
I'd love to see this figured out and merged, as it would let us turn the race detector back on for our p2p tests. If there's a specific area that needs help I'd be happy to pitch in! |
We could use the |
I have a correct fix here: #21 but it needs review/discussion. |
No description provided.