Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: datarace in the handshake #17

Closed
wants to merge 3 commits into from
Closed

Conversation

Kubuxu
Copy link
Member

@Kubuxu Kubuxu commented Aug 1, 2017

No description provided.

@Kubuxu
Copy link
Member Author

Kubuxu commented Aug 1, 2017

Replaces #16

@Kubuxu
Copy link
Member Author

Kubuxu commented Aug 1, 2017

@JustinDrake I think this is a bit better solution.

@Kubuxu Kubuxu force-pushed the fix/handshake/datarace branch from ebb6e89 to d3f3590 Compare August 1, 2017 12:58
@multiformats multiformats deleted a comment from coveralls Aug 1, 2017
@multiformats multiformats deleted a comment from coveralls Aug 1, 2017
os:
- linux
- osx
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did we disable macos for a reason?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OSX is delaying builds on Travis so I disable it in repos that don't have code that might depend on the system.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it.

if !l.rhandshake {
l.rhlock.RLock()
rhandshake := l.rhandshake
l.rhlock.RUnlock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check was intentionally racy to avoid atomic operations on the fast path. Go routines should erroneously conclude that the handshake hasn't happened at most once (and the synchronization inside the handshake functions should prevent double handshakes).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it might be only case when races are "good" but it screws over race test in all code that uses this. This means that we can't run race detector on big part of go-libp2p and go-ipfs code unless we tolerate false positives.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, there's got to be a way to tell go this is "fine" (hopefully without writing a C extension). I know I've seen code in the go standard library that does this...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, we can probably use internals/race. However, we can probably get a way with using sync.Once without losing much, if any, performance.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't use internal/race internal packages are locked and can be used only used buy children of the package. In this case only buy stdlib.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to do it previously with Once but I run into some problems. One of them was that I had to implement AsyncDo (simple) so not every time new goroutine is started when we want to run something once and then I got some lockup but I could not figure out what was causing it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why even do it in a goroutine? We need to wait for it to finish anyways. Can't we just have a lazyInitialize function and call once.Do(msm.lazyInitialize)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could have two Once's, one for the read handshake, one for the write handshake (where each handshake side would asynchronously fire off the other side of the handshake (in a once)).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, sorry for the massive delay...


l.rhlock.Lock()
defer l.rhlock.Unlock()

l.rhsync = true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is racy. You can't check, unlock, then relock and expect the state to stay the same.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, I think I missed because I restructured this code twice while writing it reverting to simpler solution.

Let's figure out #17 (comment) first.

return l.werr
}

l.whlock.Lock()
defer l.whlock.Unlock()

l.whsync = true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same race here.

whandshake := l.whandshake
l.whlock.RUnlock()

if !whandshake {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, intentionally a racy fast-path.

@b5
Copy link

b5 commented Feb 12, 2018

I'd love to see this figured out and merged, as it would let us turn the race detector back on for our p2p tests. If there's a specific area that needs help I'd be happy to pitch in!

@whyrusleeping
Copy link
Member

We could use the //go:norace pragma before these functions to tell the race detector to go home.

@Stebalien
Copy link
Member

I have a correct fix here: #21 but it needs review/discussion.

@Stebalien Stebalien closed this Feb 21, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants