-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nsqd: data format and/or import #373
Comments
actually feel free to close this if you want, I might go with persisting the progress somewhere else and just continuing the job from where it left off, that'll be cheaper for us anyway since s3 drops connections pretty frequently |
One possible solution would be to create a topic for a given file on a "empheral" nsqd. When you're successfully pushed all of the messages for that file to that nsqd, you could start |
@dudleycarr's suggestion is interesting, you wouldn't even need to use I think it probably makes sense to track some state for these jobs somewhere. First, it probably makes sense to divide the high-level operation into various sub-tasks. Each sub-task state could be tracked individually. If the various phases of the operation are prone to failure for any number of reasons, it seems like you need to keep track of intermediate progress. NSQ can continue to serve as a transport and work dispatch, but the workers would benefit from this domain specific state to determine whether to redo a given sub-task. Since you asked, it wouldn't be too hard to produce Relatedly #304 was interested in |
ended up just deduping in redis for now so it can requeue without any problems, not a huge deal for this use-case so I'll close thanks guys! |
pragmatic 👍 |
So I have a bit of a weird use-case, we basically have very large jobs of fetching 700k+ files from s3, unpacking them and PUBing hundreds of thousands of events per file, and we need it to be somewhat atomic.
Right now the prototype I have put together works fine but obviously if that job fails at any point and gets requeued we have a ton of duplicate work to do, and half of it has already been queued.
My plan is/was to write NSQD's dat files somewhere on disk and then copy those over for nsq to soak up once the job is complete. So that leads to my questions of:
a) would you be interested in some sort of import-from-file functionality?
b) is there anything I should watch out for when doing this?
c) any documentation on the binary format? (I'll dig around)
cheers
The text was updated successfully, but these errors were encountered: