-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nsqd: reduce client/channel lock contention #701
Conversation
doRequeue is always called inside exitLock now
seeing these race detector failures:
|
this is exciting! |
just as we don't need to lock the whole instance for the lifetime of the stats gathering, we don't need to lock the topic for the lifetime of the channel stats gathering
d887f43
to
48b2872
Compare
@@ -214,7 +215,7 @@ func (c *clientV2) Identify(data identifyDataV2) error { | |||
} | |||
|
|||
func (c *clientV2) Stats() ClientStats { | |||
c.RLock() | |||
c.metaLock.RLock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI this was ultimately the most problematic case.
Previously, this was competing with network writes to this client here, so when that blocked, this blocked. It's called as part of the underlying code path for /stats
here. Prior to #700, the body of nsqd.GetStats()
would hold an nsqd-wide lock for the entire lifetime of the function call, meaning that a single blocking client could effectively block any operations that required a top-level nsqd-wide lock, or a topic/channel-level lock for whichever the misbehaving client was subscribed to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P.S. this has the really nice side-effect of making /stats
requests really fast, regardless of nsqd
load, addressing long-standing issues with nsqadmin
timing out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yay!
green light RFR @jehiah |
nsqd: reduce client/channel lock contention
@zachbadgett thanks for the follow up review on this! |
Of course :D |
While evaluating a production performance issue I uncovered some
unnecessary lock contention inside the channel and client code
paths.
The ultimate cause of the issue was that we held client-wide,
channel-wide (and ultimately NSQ-wide, see #700) locks during network
operations, and if those operations block it would prevent other
simultaneous, unrelated, operations from making progress.