nsqd: persisted data repeatedly consumed #730

mreiferson · 2016-04-09T18:51:06Z

hi，
i have a question about nsqd persisted meta file update.
my nsqd version:

nsqd v0.3.7 (built w/go1.6)

when nsqd start up, it load persisted data from disk, and then subscriber consumed the msg, but the nsqd not update the persist meta file, and then if nsqd crash when it restart, my subscriber will consume the same messages again. I know nsqd guarantees "messages are delivered at least onceAnchor link for: messages are delivered at least once"， but is this scene can improve？

ploxiln · 2016-04-01T19:15:12Z

The metadata file should have (eventually) been updated. It may not have been synced to disk until some time and/or more messages were processed ... but it would indeed be a bug if it could wait indefinitely before being synced. See the --sync options:

  -sync-every int
        number of messages per diskqueue fsync (default 2500)
  -sync-timeout duration
        duration of time per diskqueue fsync (default 2s)

Looks like, by default, the metadata file should be written and fsynced every 2 seconds. If you can reproduce a case where that does not happen, it's a bug :)

jinhao · 2016-04-05T02:27:33Z

@ploxiln it seems option -sync-timeout duration not work as our expected , the case can reproduce every time, you can try my scene.

mreiferson · 2016-04-05T16:39:00Z

@jinhao can you list the steps to reproduce?

jinhao · 2016-04-06T01:47:16Z

@mreiferson

my topic is uploadMsg channel is default
my meta fileuploadMsg:default.diskqueue.meta.dat

publish msgA to topicA (no consumer consumed the msg)
kill -2 nsqdpid (nsqd will update the meta file and msgA of topicA to disk)
my meta file

1
0,2622
0,2760
restart the nsqd(reload the msgA)
start nsq_tail consumed msgA
see meta file not updated
kill -9 nsqdpid
restart the nsqd and will see it reload msgA again
meta file is still

1
0,2622
0,2760

mreiferson · 2016-04-06T21:37:56Z

@jinhao what are the command line flags you're using for nsqd?

jinhao · 2016-04-07T09:15:28Z

@mreiferson
first time:

./nsqd -tcp-address=:4250 -http-address=:4251 --lookupd-tcp-address=172.16.154.105:4160 -broadcast-address=172.16.154.105

second:

./nsqd -sync-every=1 -tcp-address=:4250 -http-address=:4251 --lookupd-tcp-address=172.16.154.105:4160 -broadcast-address=172.16.154.105

both not work.

mreiferson · 2016-04-09T15:51:27Z

OK, I understand what's happening.

nsqd will only sync metadata when writes have occurred, see: https://github.com/nsqio/nsq/blob/master/nsqd/diskqueue.go#L618-L622

This feels wrong, for this exact reason. We can still avoid "unnecessary" syncs, which is what I think the code was intended to guard against, by counting reads too.

Thoughts @jehiah ?

jehiah · 2016-04-09T18:19:28Z

@mreiferson yeah i agree. it should count reads and writes equally

mreiferson · 2016-04-09T18:51:26Z

RFR

jehiah · 2016-04-09T19:54:48Z

nsqd/diskqueue.go

 	var r chan []byte

 	syncTicker := time.NewTicker(d.syncTimeout)

 	for {
 		// dont sync all the time :)
-		if count == d.syncEvery {
-			count = 0
+		if wcount == d.syncEvery {


don't we want (wcount + rcount) == d.syncEvery ?

And more generally, do we need to distinguish between read and write counts, can we always use a single count var?

can we always use a single count var?

That should simplify things.

mreiferson · 2016-04-09T21:38:25Z

ready @jehiah

jinhao · 2016-04-11T02:06:02Z

@mreiferson
can release a new version of nsq

I try to get the latest code from github, and rebuild the nsq
but the nsqd cannot communicate with nsqlookupd
it seems new nsqd use protocol 'V2', but nsqlookupd still use 'V1'
nsqlookupd log

[nsqlookupd] 2016/04/11 10:01:01.732021 CLIENT(127.0.0.1:63929): desired protocol magic '  V1'
[nsqlookupd] 2016/04/11 10:01:01.732123 ERROR: [127.0.0.1:63929] - E_BAD_BODY IDENTIFY    missing fields
[nsqlookupd] 2016/04/11 10:01:01.732179 CLIENT(127.0.0.1:63929): closing
[nsqlookupd] 2016/04/11 10:01:01.732192 ERROR: client(127.0.0.1:63929) - E_BAD_BODY IDENTIFY missing fields

judwhite · 2016-04-18T02:41:24Z

@jinhao That looks like you have nsqd -lookupd-tcp-address set to connect to the nsqlookupd HTTP port (4161). Can you try 4160 instead? If that's not the case can you paste your nsqd and nsqlookupd args?

mreiferson added the question label Apr 1, 2016

mreiferson changed the title ~~persisted data consumed repeated~~ nsqd: persisted data repeatedly consumed Apr 1, 2016

mreiferson added the bug label Apr 5, 2016

mreiferson removed the question label Apr 9, 2016

jehiah reviewed Apr 9, 2016
View reviewed changes

nsqd: diskqueue syncs when only reads have occurred

497111e

mreiferson force-pushed the diskqueue-read-sync-730 branch from da8c8aa to 497111e Compare April 9, 2016 21:38

jehiah merged commit 6ec6bee into nsqio:master Apr 9, 2016

mreiferson deleted the diskqueue-read-sync-730 branch April 10, 2016 05:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nsqd: persisted data repeatedly consumed #730

nsqd: persisted data repeatedly consumed #730

mreiferson commented Apr 9, 2016

ploxiln commented Apr 1, 2016

jinhao commented Apr 5, 2016

mreiferson commented Apr 5, 2016

jinhao commented Apr 6, 2016

mreiferson commented Apr 6, 2016

jinhao commented Apr 7, 2016

mreiferson commented Apr 9, 2016

jehiah commented Apr 9, 2016

mreiferson commented Apr 9, 2016

jehiah Apr 9, 2016

mreiferson Apr 9, 2016

mreiferson commented Apr 9, 2016

jinhao commented Apr 11, 2016

judwhite commented Apr 18, 2016

nsqd: persisted data repeatedly consumed #730

nsqd: persisted data repeatedly consumed #730

Conversation

mreiferson commented Apr 9, 2016

ploxiln commented Apr 1, 2016

jinhao commented Apr 5, 2016

mreiferson commented Apr 5, 2016

jinhao commented Apr 6, 2016

mreiferson commented Apr 6, 2016

jinhao commented Apr 7, 2016

mreiferson commented Apr 9, 2016

jehiah commented Apr 9, 2016

mreiferson commented Apr 9, 2016

jehiah Apr 9, 2016

Choose a reason for hiding this comment

mreiferson Apr 9, 2016

Choose a reason for hiding this comment

mreiferson commented Apr 9, 2016

jinhao commented Apr 11, 2016

judwhite commented Apr 18, 2016