Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nsqd: data race condition in nsqd.GetStats() #970

Closed
daroot opened this issue Dec 8, 2017 · 1 comment
Closed

nsqd: data race condition in nsqd.GetStats() #970

daroot opened this issue Dec 8, 2017 · 1 comment
Labels

Comments

@daroot
Copy link

daroot commented Dec 8, 2017

While testing my own application while running an internal copy of nsqd, I discovered that there are a few places where c.inFlightMessages and c.deferredMessages are being accessed without proper locking of the c.inFlightMutex and c.deferredMutex. This causes my own tests to fail when run under go test -race, with backtraces similar to this:

WARNING: DATA RACE
Write at 0x00c420086510 by goroutine 135:
  runtime.mapdelete()
      /usr/local/Cellar/go/1.9.2/libexec/src/runtime/hashmap.go:635 +0x0
  gitlab.internal/dev/aqueduct/vendor/github.com/nsqio/nsq/nsqd.(*Channel).popInFlightMessage()
      /Users/droot/go/src/gitlab.internal/dev/aqueduct/vendor/github.com/nsqio/nsq/nsqd/channel.go:469 +0x1fd
  gitlab.internal/dev/aqueduct/vendor/github.com/nsqio/nsq/nsqd.(*Channel).FinishMessage()
      /Users/droot/go/src/gitlab.internal/dev/aqueduct/vendor/github.com/nsqio/nsq/nsqd/channel.go:351 +0x50
  gitlab.internal/dev/aqueduct/vendor/github.com/nsqio/nsq/nsqd.(*protocolV2).FIN()
      /Users/droot/go/src/gitlab.internal/dev/aqueduct/vendor/github.com/nsqio/nsq/nsqd/protocol_v2.go:686 +0x44d
  gitlab.internal/dev/aqueduct/vendor/github.com/nsqio/nsq/nsqd.(*protocolV2).Exec()
      /Users/droot/go/src/gitlab.internal/dev/aqueduct/vendor/github.com/nsqio/nsq/nsqd/protocol_v2.go:175 +0x13ba
  gitlab.internal/dev/aqueduct/vendor/github.com/nsqio/nsq/nsqd.(*protocolV2).IOLoop()
      /Users/droot/go/src/gitlab.internal/dev/aqueduct/vendor/github.com/nsqio/nsq/nsqd/protocol_v2.go:83 +0x59c
  gitlab.internal/dev/aqueduct/vendor/github.com/nsqio/nsq/nsqd.(*tcpServer).Handle()
      /Users/droot/go/src/gitlab.internal/dev/aqueduct/vendor/github.com/nsqio/nsq/nsqd/tcp.go:43 +0x82e

Previous read at 0x00c420086510 by goroutine 56:
  gitlab.internal/dev/aqueduct/vendor/github.com/nsqio/nsq/nsqd.NewChannelStats()
      /Users/droot/go/src/gitlab.internal/dev/aqueduct/vendor/github.com/nsqio/nsq/nsqd/stats.go:55 +0xe5
  gitlab.internal/dev/aqueduct/vendor/github.com/nsqio/nsq/nsqd.(*NSQD).GetStats()
      /Users/droot/go/src/gitlab.internal/dev/aqueduct/vendor/github.com/nsqio/nsq/nsqd/stats.go:157 +0x8f2
  gitlab.internal/dev/aqueduct/cmd/nsq-consumer.TestHappy()
      /Users/droot/go/src/gitlab.internal/dev/aqueduct/cmd/nsq-consumer/main_test.go:174 +0xd9e
  testing.tRunner()
      /usr/local/Cellar/go/1.9.2/libexec/src/testing/testing.go:746 +0x16c

Goroutine 135 (running) created at:
  gitlab.internal/dev/aqueduct/vendor/github.com/nsqio/nsq/internal/protocol.TCPServer()
      /Users/droot/go/src/gitlab.internal/dev/aqueduct/vendor/github.com/nsqio/nsq/internal/protocol/tcp_server.go:32 +0x4ca
  gitlab.internal/dev/aqueduct/vendor/github.com/nsqio/nsq/nsqd.(*NSQD).Main.func1()
      /Users/droot/go/src/gitlab.internal/dev/aqueduct/vendor/github.com/nsqio/nsq/nsqd/nsqd.go:231 +0xdf
  gitlab.internal/dev/aqueduct/vendor/github.com/nsqio/nsq/internal/util.(*WaitGroupWrapper).Wrap.func1()
      /Users/droot/go/src/gitlab.internal/dev/aqueduct/vendor/github.com/nsqio/nsq/internal/util/wait_group_wrapper.go:14 +0x34

Goroutine 56 (running) created at:
  testing.(*T).Run()
      /usr/local/Cellar/go/1.9.2/libexec/src/testing/testing.go:789 +0x568
  testing.runTests.func1()
      /usr/local/Cellar/go/1.9.2/libexec/src/testing/testing.go:1004 +0xa7
  testing.tRunner()
      /usr/local/Cellar/go/1.9.2/libexec/src/testing/testing.go:746 +0x16c
  testing.runTests()
      /usr/local/Cellar/go/1.9.2/libexec/src/testing/testing.go:1002 +0x521
  testing.(*M).Run()
      /usr/local/Cellar/go/1.9.2/libexec/src/testing/testing.go:921 +0x206
  main.main()
      gitlab.internal/dev/aqueduct/cmd/nsq-consumer/_test/_testmain.go:54 +0x1d3
==================
--- FAIL: TestHappy (2.47s)
        testing.go:699: race detected during execution of test```

I'll submit a PR to fix this shortly.
@mreiferson
Copy link
Member

@daroot thanks for finding this, will follow up in #971!

@mreiferson mreiferson added the bug label Jan 8, 2018
@mreiferson mreiferson changed the title Data race condition in nsqd.GetStats() nsqd: data race condition in nsqd.GetStats() Jan 8, 2018
gola pushed a commit to gola/nsq that referenced this issue Oct 25, 2018
* create inFlightMessages and deferredMessages maps inside mutex.
* acquire appropriate mutexes before doing len() in NewChannelStats()
* add test that checks for passing -race when accessing stats while
  serving messages.

Fixes issue nsqio#970.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants