Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove or increase the consul MAX_SERVICES limit #1938

Closed
jkoppe opened this issue Sep 22, 2015 · 6 comments
Closed

Remove or increase the consul MAX_SERVICES limit #1938

jkoppe opened this issue Sep 22, 2015 · 6 comments
Assignees

Comments

@jkoppe
Copy link

jkoppe commented Sep 22, 2015

I get why this limit exists -- you probably don't want to overwhelm your systems with unbounded growth.

We have hundreds of services with healthchecks in our consul clusters. The current limit of 50 is drastically hampering our ability to make use of the Service Checks + Monitor features of datadog. At best, we could whitelist the most critical 50 services and get monitoring for those, but I'd really like to be able to monitor hundreds of distinct services.

I can easily submit a pull request for the dd-agent limit increase, but I wanted to discuss with y'all if anything else is necessary on your server side and what a more reasonable & agreed upon limit might be.

@talwai
Copy link
Contributor

talwai commented Sep 24, 2015

Hi @jkoppe - we're happy to increase the limit within reasonable bounds. The current threshold of 50 felt in the ballpark for us, but was not painstakingly researched. What's a number that makes sense for your particular use case?

@talwai
Copy link
Contributor

talwai commented Sep 24, 2015

While MAX_SERVICES isn't configurable by the standard means right now, you can always override the standard check by:

  1. curl https://raw.githubusercontent.com/DataDog/dd-agent/master/checks.d/consul.py > ./consul.py
  2. sudo mv ./consul.py /etc/dd-agent/checks.d/
  3. Swapping out MAX_SERVICES here for your preferred value.

While this is not a sustainable solution, it will help us be a bit more empirical about this decision.
It would be great if you could try the above and then send us the output of
sudo -u dd-agent /opt/datadog-agent/agent/agent.py check consul
The output will contain a summary of the number of metrics / service checks sent by a one-off run of the check.
.

@talwai talwai self-assigned this Sep 24, 2015
@jkoppe
Copy link
Author

jkoppe commented Sep 25, 2015

I bumped it up to 5000 to make sure it fit in our entire infrastructure for the foreseeable future. Then I made a monitor.

Then I tried to edit my monitor later, and I can't edit the monitor: https://app.datadoghq.com/monitors#284027/edit

I wonder if the edit page for this monitor is broken because of the # of services. :)

@talwai
Copy link
Contributor

talwai commented Sep 30, 2015

Hi @jkoppe , hmm it's weird that your monitor edit page is broken. It shouldn't be that sensitive to an increase in tags - we'll take a look internally.

I apologize for having misunderstood you initially, if you look at the code, the consul.check service check is actually unrestricted by MAX_SERVICES right now. The limitations you were/are seeing regarding Service Checks + Monitors were potentially the result of this bug, a fix for which we just merged into master.

To further elaborate, MAX_SERVICES only limits the catalog-querying section of the check, where we collect some basic counts on nodes up by service, and services up by node. 50 may still be too low a number for this, so this is a discussion worth continuing. As to your original use-case, using the patched check from our master branch under /etc/dd-agent/checks.d should allow you to use consul.check monitors to your heart's content. If you don't want to patch manually, the patch will be part of our 5.5.2 release which should be out very shortly!

@jkoppe
Copy link
Author

jkoppe commented Sep 30, 2015

Thanks. I patched consul.py on all of my servers and I know see what I'd expect! I also droped MAX_SERVICES back to 50. I think this particular issue can be closed.

However, I made another monitor which is very similar to the other one I created, and it also, isn't editable: https://app.datadoghq.com/monitors#287256/edit. Let me know if I should open a ticket with support on this or if I should just wait for you to check on here.

@talwai
Copy link
Contributor

talwai commented Sep 30, 2015

Great to hear that the patch was helpful! Yea, would you mind opening a ticket directly with support regarding your issue with editing monitors?

Closing this issue as suggested

@talwai talwai closed this as completed Sep 30, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants