Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nsqd: persisted data repeatedly consumed #730

Merged
merged 1 commit into from
Apr 9, 2016

Conversation

mreiferson
Copy link
Member

hi,
i have a question about nsqd persisted meta file update.
my nsqd version:

nsqd v0.3.7 (built w/go1.6)

when nsqd start up, it load persisted data from disk, and then subscriber consumed the msg, but the nsqd not update the persist meta file, and then if nsqd crash when it restart, my subscriber will consume the same messages again. I know nsqd guarantees "messages are delivered at least onceAnchor link for: messages are delivered at least once", but is this scene can improve?

@ploxiln
Copy link
Member

ploxiln commented Apr 1, 2016

The metadata file should have (eventually) been updated. It may not have been synced to disk until some time and/or more messages were processed ... but it would indeed be a bug if it could wait indefinitely before being synced. See the --sync options:

  -sync-every int
        number of messages per diskqueue fsync (default 2500)
  -sync-timeout duration
        duration of time per diskqueue fsync (default 2s)

Looks like, by default, the metadata file should be written and fsynced every 2 seconds. If you can reproduce a case where that does not happen, it's a bug :)

@mreiferson mreiferson changed the title persisted data consumed repeated nsqd: persisted data repeatedly consumed Apr 1, 2016
@jinhao
Copy link
Author

jinhao commented Apr 5, 2016

@ploxiln it seems option -sync-timeout duration not work as our expected , the case can reproduce every time, you can try my scene.

@mreiferson
Copy link
Member

@jinhao can you list the steps to reproduce?

@mreiferson mreiferson added the bug label Apr 5, 2016
@jinhao
Copy link
Author

jinhao commented Apr 6, 2016

@mreiferson

my topic is uploadMsg channel is default
my meta fileuploadMsg:default.diskqueue.meta.dat

  1. publish msgA to topicA (no consumer consumed the msg)

  2. kill -2 nsqdpid (nsqd will update the meta file and msgA of topicA to disk)
    my meta file

    1
    0,2622
    0,2760

  3. restart the nsqd(reload the msgA)

  4. start nsq_tail consumed msgA

  5. see meta file not updated

  6. kill -9 nsqdpid

  7. restart the nsqd and will see it reload msgA again
    meta file is still

1
0,2622
0,2760

@mreiferson
Copy link
Member

@jinhao what are the command line flags you're using for nsqd?

@jinhao
Copy link
Author

jinhao commented Apr 7, 2016

@mreiferson
first time:

./nsqd -tcp-address=:4250 -http-address=:4251 --lookupd-tcp-address=172.16.154.105:4160 -broadcast-address=172.16.154.105

second:

./nsqd -sync-every=1 -tcp-address=:4250 -http-address=:4251 --lookupd-tcp-address=172.16.154.105:4160 -broadcast-address=172.16.154.105

both not work.

@mreiferson
Copy link
Member

OK, I understand what's happening.

nsqd will only sync metadata when writes have occurred, see: https://github.com/nsqio/nsq/blob/master/nsqd/diskqueue.go#L618-L622

This feels wrong, for this exact reason. We can still avoid "unnecessary" syncs, which is what I think the code was intended to guard against, by counting reads too.

Thoughts @jehiah ?

@jehiah
Copy link
Member

jehiah commented Apr 9, 2016

@mreiferson yeah i agree. it should count reads and writes equally

@mreiferson
Copy link
Member

RFR

var r chan []byte

syncTicker := time.NewTicker(d.syncTimeout)

for {
// dont sync all the time :)
if count == d.syncEvery {
count = 0
if wcount == d.syncEvery {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't we want (wcount + rcount) == d.syncEvery ?

And more generally, do we need to distinguish between read and write counts, can we always use a single count var?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we always use a single count var?

That should simplify things.

@mreiferson mreiferson force-pushed the diskqueue-read-sync-730 branch from da8c8aa to 497111e Compare April 9, 2016 21:38
@mreiferson
Copy link
Member

ready @jehiah

@jehiah jehiah merged commit 6ec6bee into nsqio:master Apr 9, 2016
@mreiferson mreiferson deleted the diskqueue-read-sync-730 branch April 10, 2016 05:58
@jinhao
Copy link
Author

jinhao commented Apr 11, 2016

@mreiferson
can release a new version of nsq

I try to get the latest code from github, and rebuild the nsq
but the nsqd cannot communicate with nsqlookupd
it seems new nsqd use protocol 'V2', but nsqlookupd still use 'V1'
nsqlookupd log

[nsqlookupd] 2016/04/11 10:01:01.732021 CLIENT(127.0.0.1:63929): desired protocol magic '  V1'
[nsqlookupd] 2016/04/11 10:01:01.732123 ERROR: [127.0.0.1:63929] - E_BAD_BODY IDENTIFY    missing fields
[nsqlookupd] 2016/04/11 10:01:01.732179 CLIENT(127.0.0.1:63929): closing
[nsqlookupd] 2016/04/11 10:01:01.732192 ERROR: client(127.0.0.1:63929) - E_BAD_BODY IDENTIFY missing fields

@judwhite
Copy link
Contributor

@jinhao That looks like you have nsqd -lookupd-tcp-address set to connect to the nsqlookupd HTTP port (4161). Can you try 4160 instead? If that's not the case can you paste your nsqd and nsqlookupd args?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants