Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[service_discovery][jmx] trying to pick-up JMX changes with SD. #3010

Merged
merged 7 commits into from
Dec 2, 2016

Conversation

truthbk
Copy link
Member

@truthbk truthbk commented Nov 9, 2016

What does this PR do?

Enables service discovery for JMX integrations.

Once a service discovery index config is detected, we parse the config from JSON (as stored in etcd/consul) to YAML and the relayed over to JMXfetch over a named pipe dd-service_discovery that lives in /tmp by default.

If multiple changes are detected, these are all submitted to JMXfetch by a single write operation to avoid race conditions. The pipe should take care of locking from the reading and writing ends, and by dropping all relevant configs in a single write operation we can guarantee all configs will be read on the JMXfetch in a single iteration and thus help avoid having multiple versions of the same config in the pipe at any given time.

Motivation

Enabling Service Discovery for JMXfetch integrations.

Testing Guidelines

An overview on testing
is available in our contribution guidelines.

Additional Notes

Related PR: DataDog/jmxfetch#113

@truthbk truthbk force-pushed the jaime/pipedsd branch 2 times, most recently from dd70cb1 to 7c96a62 Compare November 9, 2016 17:59
JMX_GRACE_SECS = 2
SERVICE_DISCOVERY_PREFIX = 'SD-'
SD_PIPE_NAME = "dd-service_discovery"
SD_PIPE_UNIX_PATH = "/tmp"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick note: I think using /opt/datadog-agent/run is more appropriate and easier to adapt to Windows, unless you have a good reason to use /tmp. 😃

Copy link
Member Author

@truthbk truthbk Nov 9, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Named pipes are only supported on windows in a special path (and I don't even think all versions will support it anyways), so we'll need a specific path for windows anyway.

I used /tmp to avoid issues with the source installs.

@truthbk
Copy link
Member Author

truthbk commented Nov 9, 2016

TODO: I could probably add some tests for the generate_jmx_configs function.

@truthbk truthbk force-pushed the jaime/pipedsd branch 2 times, most recently from 177649c to b9f8d01 Compare November 16, 2016 18:45
Copy link
Member

@hkaj hkaj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one @truthbk ! This will be a major improvement for Java apps monitoring :)
Left a few comments.


if not os.path.exists(pipe_name):
os.mkfifo(pipe_name)
self.sd_pipe = os.open(pipe_name, os.O_RDWR) # RW to avoid blocking (will only W)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think we should create the fifo if SD is not enabled

@@ -1,4 +1,3 @@
# (C) Datadog, Inc. 2010-2016
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's keep this guy around

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how I got rid of that! 😓

try:
yaml = config_to_yaml(check_config)
# generated["{}_{}".format(check_name, idx)] = yaml
generated["{}_{}".format(check_name, 0)] = yaml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the _0 for?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this was actually a product of an original misunderstanding on my part. But then I saw some utility for it (although I have to take another look at this code) - for example simultaneous containers for the same service, but with different tags or whatever, that could have different configurations. On the JMX side we'd use the check name as the key in the map, so this would allow us to support that in the future, that's why we have the commented line on the top. We can discuss and possibly remove.

log.debug("YAML generated: %s", yaml)
except Exception as e:
log.exception("Unable to generate YAML config for %s: %s", check_name, e)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

except:
    log.exception("Unable to generate YAML config for %s", check_name)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch!

@@ -123,6 +123,9 @@ gce_updated_hostname: yes
# and modify its value.
# sd_template_dir: /datadog/check_configs
#
# JMX Service Disocvery
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a better description would be nice, something like Enable service discovery for JMX checks ?


return yaml_output

def dump_yaml(path, config):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't seem to be used anywhere

@@ -216,9 +248,17 @@ def run(self, config=None):
if self._agentConfig.get('service_discovery'):
self.sd_backend = get_sd_backend(self._agentConfig)

# Initialize Supervisor proxy (unix specific)
self.supervisor_proxy = self._get_supervisor_socket(self._agentConfig)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to init that if SD is not enabled?

# Load JMX configs if available
jmx_sd_configs = generate_jmx_configs(self._agentConfig, hostname)
if jmx_sd_configs:
self._submit_jmx_service_discovery(jmx_sd_configs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto, this will trigger even without SD, is that expected?

@hkaj
Copy link
Member

hkaj commented Nov 21, 2016

LGTM

[service_discovery] adding named pipe for IPC.

[util] if file doesnt exist, create it.

[service_discovery][jmxfetch] add some service discovery configurability.

[jmxfetch][service_discovery] increase max heap size for SD to 512.

[service_discovery][jmxfetch] create jmx service discovery option in config.

[service_discovery][jmx] if jmx is not up, try to bring it up - wait for it to boot too.

[service_discovery] if the previous index was None, consider that a change (bootup changes to apply).

[service_discovery][jmx] cleaning up service discovery/supervisor interactions.

[service_discovery][jmx] supervisor_proxy should not be None, cleanup.

[service_discovery][jmxfetch] avoid double-reload of checks at start-up. Refactor.
@truthbk
Copy link
Member Author

truthbk commented Dec 2, 2016

No changes to the logic, just added the test. We're green, so merging! Thanks @hkaj

@truthbk truthbk merged commit 13d63df into master Dec 2, 2016
@truthbk truthbk deleted the jaime/pipedsd branch December 2, 2016 19:28
@olivielpeau olivielpeau added this to the 5.11.0 milestone Dec 6, 2016
degemer added a commit that referenced this pull request Dec 21, 2016
* master: (53 commits)
  [nginx] Update example config
  [service_discovery] Add a Zookeeper service discovery implementation.
  [aggregator] if sample rate is bad, fix it but still parse tags. (#3073)
  [yarn] whitelist authorized application_tags
  Alex poe/update jmx with refresh beans (#3068)
  [config] Fix `_is_affirmative` when passed argument is `None` (#3063)
  Send all configured tags with process checks. (#2976)
  fix flake8 errors
  [flare] ignore whitespace before proxy credentials
  [core] add a switch to disable profiling, but still use developer mode (#2898)
  [tests] allow tests to use the additional_checksd parameter (#3056)
  [service_discovery][jmx] trying to pick-up JMX changes with SD. (#3010)
  [install_script] Make `dd-agent` group of `datadog.conf` (#3036)
  [postgres] Allow disable postgresql.database_size (#3035)
  [core] Fixes IndexError for process lookup (#3043)
  remove warning message leaking password strings (#3053)
  trap psutil.NoSuchProcess exception (#3052)
  Fix grammar and casing in exception text (#3050)
  allow override of kubelet host with KUBERNETES_KUBELET_HOST env var
  [service discovery] properly handle config reload for removed containers
  ...
@masci masci modified the milestones: 5.11.0, 5.12.0 Jan 24, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants