Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nsqd: panic on /stats when clients present #245

Merged
merged 1 commit into from
Aug 12, 2013

Conversation

mreiferson
Copy link
Member

The stack trace below is what I get on the console when my instance of nsqd stops responding. The messages are plain text, not JSON being sent across the topics, have one channel attached to each affected topic.

I've been able to reproduce this on a sandbox server with one channel named "test" and one channel named "chan". Sending a couple of messages across the wire, then letting it sit for a while (~5 minutes), coming back and sending another one, causes the stats endpoint on nsqd to return empty results.

Total setup looks like this:

  • nsqlookupd
  • nsqd with tls and connected to lookupd
  • nsqadmin connected to looked
  • nsq_pubsub example with one subscriber on topic test, channel chan.

Ubuntu 13.04/amd64 3.8.0-23-generic SMP

2013/08/11 00:27:22 http: panic serving 186.4.15.88:36956: runtime error: index out of range
/usr/lib/go/src/pkg/net/http/server.go:576 (0x4ce692)
        _func_003: buf.Write(debug.Stack())
/build/buildd/golang-1.0.2/src/pkg/runtime/proc.c:1443 (0x4338e5)
/build/buildd/golang-1.0.2/src/pkg/runtime/runtime.c:128 (0x4344c5)
/build/buildd/golang-1.0.2/src/pkg/runtime/runtime.c:85 (0x43436c)
/home/jtregunna/nsq/nsqd/stats.go:117 (0x41c22e)
        (*NSQd).getStats: clients[client_index] = client.Stats()
/home/jtregunna/nsq/nsqd/http.go:407 (0x40cecc)
        (*httpServer).statsHandler: stats := s.context.nsqd.getStats()
/home/jtregunna/nsq/nsqd/http.go:34 (0x40a53d)
        (*httpServer).ServeHTTP: s.statsHandler(w, req)
/usr/lib/go/src/pkg/net/http/server.go:656 (0x4c24a4)
        (*conn).serve: handler.ServeHTTP(w, w.req)
/build/buildd/golang-1.0.2/src/pkg/runtime/proc.c:271 (0x4319eb)
2013/08/11 00:27:25 LOOKUPD(picard.srcd.mp:4160): sending heartbeat
2013/08/11 00:27:25 http: panic serving 186.4.15.88:34498: runtime error: index out of range                                                                                                                                
/usr/lib/go/src/pkg/net/http/server.go:576 (0x4ce692)
        _func_003: buf.Write(debug.Stack())
/build/buildd/golang-1.0.2/src/pkg/runtime/proc.c:1443 (0x4338e5)                                                                                                                                                                        
/build/buildd/golang-1.0.2/src/pkg/runtime/runtime.c:128 (0x4344c5)                                                                                                                                                                      
/build/buildd/golang-1.0.2/src/pkg/runtime/runtime.c:85 (0x43436c)
/home/jtregunna/nsq/nsqd/stats.go:117 (0x41c22e)
        (*NSQd).getStats: clients[client_index] = client.Stats()
/home/jtregunna/nsq/nsqd/http.go:407 (0x40cecc)
        (*httpServer).statsHandler: stats := s.context.nsqd.getStats()
/home/jtregunna/nsq/nsqd/http.go:34 (0x40a53d)
        (*httpServer).ServeHTTP: s.statsHandler(w, req)
/usr/lib/go/src/pkg/net/http/server.go:656 (0x4c24a4)
        (*conn).serve: handler.ServeHTTP(w, w.req)
/build/buildd/golang-1.0.2/src/pkg/runtime/proc.c:271 (0x4319eb)

@mreiferson
Copy link
Member

Thanks for the report. What's the go version and NSQ revision?

@jeremytregunna
Copy link
Author

Go 1.0.2 and NSQ 0.2.22-alpha (HEAD ref: 1154c59)

@jehiah
Copy link
Member

jehiah commented Aug 11, 2013

@jeremytregunna I'm curious if you have a specific motivation for using Go 1.0.2 vs 1.0.3 or 1.1.

There were a lot of issues that were fixed in 1.0.3, and that's listed as the official minimum supported version on the install page. Are you trying to build nsq for something beyond the linux/darwin binary builds?

@jeremytregunna
Copy link
Author

@jehiah No, I was exploring using nsq for a particular purpose, and my sandbox machine had 1.0.2 installed. I didn't see any note about 1.0.3, must have glossed over it. I'll close this ticket then, and only reopen if this is repeatable in a binary build.

@mreiferson
Copy link
Member

@jeremytregunna cool, let us know how it goes and feel free to ask any other usage related questions on the user group.

FYI the latest stable binary distribution does not include TLS support, if that was an important feature for you. We should be stamping a new stable shortly but in the meantime there is a 0.2.22-alpha build up for download.

@jeremytregunna
Copy link
Author

This is still an issue with go 1.1.1 on the same hardware. Exact steps to reproduce:

  • Start nsqlookupd
  • Start nsqd (same args as before)
  • Start nsqadmin
  • Send a message
  • Query stats on nsqd's http interface after nsqadmin is up and running

This is the backtrace I get now:

2013/08/11 20:00:38 http: panic serving 186.4.15.88:36848: runtime error: index out of range
goroutine 28 [running]:
net/http.func·007()
        /usr/local/go/src/pkg/net/http/server.go:1022 +0xac
net/http.func·007()
        /usr/local/go/src/pkg/net/http/server.go:1022 +0xac
main.(*NSQd).getStats(0xc2000c3a90, 0x0, 0x0, 0x0)
        /home/jtregunna/nsq/nsqd/stats.go:117 +0x852
main.(*httpServer).statsHandler(0xc200184b70, 0xc200190600, 0xc2001d44d0, 0xc2002e9b60)
        /home/jtregunna/nsq/nsqd/http.go:407 +0x367
main.(*httpServer).ServeHTTP(0xc200184b70, 0xc200190600, 0xc2001d44d0, 0xc2002e9b60)
        /home/jtregunna/nsq/nsqd/http.go:34 +0xb6c
net/http.serverHandler.ServeHTTP(0xc2002ffbe0, 0xc200190600, 0xc2001d44d0, 0xc2002e9b60)
        /usr/local/go/src/pkg/net/http/server.go:1517 +0x16c
net/http.(*conn).serve(0xc2001af900)
        /usr/local/go/src/pkg/net/http/server.go:1096 +0x765
created by net/http.(*Server).Serve
        /usr/local/go/src/pkg/net/http/server.go:1564 +0x266

@mreiferson
Copy link
Member

I'm guessing I probably broke something in #242

Will take a look in a few.

@mreiferson
Copy link
Member

yep, #242 introduced a regression relating to walking the map of clients when collecting stats.

RFR @jehiah

(I will add a test for this as well)

@mreiferson
Copy link
Member

@jeremytregunna if you wanna pull down this revision and confirm this fixes it for you that would be helpful as well, thanks!

@jeremytregunna
Copy link
Author

@mreiferson I reverted 81cf8e8 and cleaned, installed, and tried to reproduce again, I got the same behaviour, same results in the last crash log I posted.

@mreiferson
Copy link
Member

that's odd, I can't reproduce after applying that fix (test added btw)...

Can you double check the binary you're running contains the fix?

@jeremytregunna
Copy link
Author

@mreiferson Sorry, my mistake. a033faa does not cause the crash.

@mreiferson
Copy link
Member

@jeremytregunna great, thanks for your help in tracking this down.

jehiah added a commit that referenced this pull request Aug 12, 2013
nsqd: panic on /stats when clients present
@jehiah jehiah merged commit 98ca2d9 into nsqio:master Aug 12, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants