Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nsqd: configurable queue scan worker pool #1178

Conversation

andyxning
Copy link
Member

@andyxning andyxning commented Aug 23, 2019

This PR adds queue-scan-worker-pool-max command line argument.

In our case, the cluster is about 600+ channels and a defer channel with high qps, the phenomenon is that the defer channel message has a large e2e delay beyond the deferred time. The root reason is that for nsqd, there are at most 4 scan worker which is too small for our cluster.

After setting queue-scan-worker-pool-max to 64, the defer message delay is more accurate.

@andyxning
Copy link
Member Author

/cc @jehiah @mreiferson @ploxiln

@ploxiln
Copy link
Member

ploxiln commented Aug 23, 2019

Good point.

I think that the description of the flag could be better, it is not really clear what it is for just from the description. But it's not really easy to come up with a concise description for this, it's sort of an implementation detail. I might say something like "max coroutines for managing in-flight and deferred queues".

By the way, because QueueScanSelectionCount is 20, at most 20 can do work at once, because queueScanLoop() sends 20 channels for work, then waits for them all to be done (then repeats). So 44 of your 64 queueScanWorkers are doing nothing. Maybe this should be adjustable as well? Maybe it should just automatically be set to worker-pool-max if worker-pool-max is greater?

I was thinking about basing these numbers on cpu cores available, which may be convenient for users that run nsqd on different sized servers so they don't have to configure it explicitly, e.g. a 2-core in development and a 32-core in production. But the user might not want nsqd to go crazy with workers just because it's in one of many containers on a large server. So just being configurable is a fine idea.

@andyxning
Copy link
Member Author

I might say something like "max coroutines for managing in-flight and deferred queues".

Done

Maybe this should be adjustable as well? Maybe it should just automatically be set to worker-pool-max if worker-pool-max is greater?

But the user might not want nsqd to go crazy with workers just because it's in one of many containers on a large server. So just being configurable is a fine idea.

By allowing end users to customize seems more reasonable. What we should do is to make the description more readable and clear. :)

@andyxning andyxning force-pushed the add_queue-scan-worker-pool-max_command_line_argument branch from e80debf to 213ccc3 Compare August 25, 2019 02:33
@andyxning andyxning changed the title add queue-scan-worker-pool-max command line argument nsqd: add queue-scan-worker-pool-max and queue-scan-selection-count command line arguments Aug 25, 2019
@andyxning andyxning force-pushed the add_queue-scan-worker-pool-max_command_line_argument branch from 213ccc3 to ff142d9 Compare August 25, 2019 02:39
--queue-scan-worker-pool-max and --queue-scan-selection-count
@ploxiln ploxiln force-pushed the add_queue-scan-worker-pool-max_command_line_argument branch from ff142d9 to e4e927d Compare August 29, 2019 05:36
@ploxiln
Copy link
Member

ploxiln commented Aug 29, 2019

I re-worded the help description again, I hope that's OK.

@ploxiln ploxiln changed the title nsqd: add queue-scan-worker-pool-max and queue-scan-selection-count command line arguments nsqd: add some queue-scan options flags Aug 29, 2019
@ploxiln ploxiln merged commit 6774510 into nsqio:master Aug 30, 2019
@mreiferson mreiferson changed the title nsqd: add some queue-scan options flags nsqd: configurable queue scan worker pool Jun 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants