Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition with collector and jmxfetch startup #3284

Closed
sdwr98 opened this issue Mar 24, 2017 · 2 comments
Closed

Race condition with collector and jmxfetch startup #3284

sdwr98 opened this issue Mar 24, 2017 · 2 comments
Labels
Milestone

Comments

@sdwr98
Copy link

sdwr98 commented Mar 24, 2017

**Output of the info page **

root@389f02b95b41:/etc/dd-agent# service datadog-agent info
====================
Collector (v 5.12.0)
====================

  Status date: 2017-03-24 13:06:19 (16s ago)
  Pid: 19
  Platform: Linux-3.10.0-514.6.1.el7.x86_64-x86_64-with-debian-8.7
  Python Version: 2.7.13, 64bit
  Logs: <stderr>, /var/log/datadog/collector.log

  Clocks
  ======

    NTP offset: 0.0022 s
    System UTC time: 2017-03-24 13:06:35.609536

  Paths
  =====

    conf.d: /etc/dd-agent/conf.d
    checks.d: /opt/datadog-agent/agent/checks.d

  Hostnames
  =========

    ec2-hostname: ip-10-1-1-182.us-west-2.internal.motusclouds.com
    local-ipv4: 10.1.1.182
    socket-hostname: 389f02b95b41
    hostname: mslv-stage-51789a7f15.us-west-2.internal.motusclouds.com
    local-hostname: ip-10-1-1-182.us-west-2.internal.motusclouds.com
    instance-id: i-07ba81951789a7f15
    socket-fqdn: 389f02b95b41

  Checks
  ======

    nginx (5.12.0)
    --------------
      - instance #0 [OK]
      - Collected 7 metrics, 0 events & 1 service check

    php_fpm (5.12.0)
    ----------------
      - instance #0 [OK]
      - instance #1 [OK]
      - Collected 12 metrics, 0 events & 2 service checks

    ntp (5.12.0)
    ------------
      - Collected 0 metrics, 0 events & 0 service checks

    disk (5.12.0)
    -------------
      - instance #0 [OK]
      - Collected 40 metrics, 0 events & 0 service checks

    docker_daemon (5.12.0)
    ----------------------
      - instance #0 [OK]
      - Collected 105 metrics, 0 events & 1 service check


  Emitters
  ========

    - http_emitter [OK]

====================
Dogstatsd (v 5.12.0)
====================

  Status date: 2017-03-24 13:06:27 (8s ago)
  Pid: 16
  Platform: Linux-3.10.0-514.6.1.el7.x86_64-x86_64-with-debian-8.7
  Python Version: 2.7.13, 64bit
  Logs: <stderr>, /var/log/datadog/dogstatsd.log

  Flush count: 118
  Packet Count: 0
  Packets per second: 0.0
  Metric count: 1
  Event count: 0
  Service check count: 0

====================
Forwarder (v 5.12.0)
====================

  Status date: 2017-03-24 13:06:35 (0s ago)
  Pid: 15
  Platform: Linux-3.10.0-514.6.1.el7.x86_64-x86_64-with-debian-8.7
  Python Version: 2.7.13, 64bit
  Logs: <stderr>, /var/log/datadog/forwarder.log

  Queue Size: 0 bytes
  Queue Length: 0
  Flush Count: 408
  Transactions received: 241
  Transactions flushed: 241
  Transactions rejected: 0
  API Key Status: API Key is valid


======================
Trace Agent (v 5.12.0)
======================

  Not running (port 8126)

Additional environment details (Operating System, Cloud provider, etc):
This agent is running inside a docker container, configured for service discovery.

Steps to reproduce the issue:

  1. Configure datadog agent for JMX service discovery
  2. Start up the agent container
  3. Sometimes jmxfetch starts up before collector, and so jmxfetch can't load the named pipe

Describe the results you received:
The following log messages show the issue:

collector.log:

2017-03-24 12:46:46 UTC | INFO | dd.collector | collector(agent.py:451) | JMX SD Config via named pip jmx_0 successfully.

and

jmxfetch.log:

2017-03-24 12:46:45,327 | WARN | App | Unable to open named pipe - Service Discovery disabled.

Note that the named pipe was created a second after jmxfetch tried to open it.

Describe the results you expected:
I expect that the collector will create the named pipe before jmxfetch tries to read from it.

Additional information you deem important (e.g. issue happens only occasionally):
This is intermittent, and seems to happen more frequently when there are more docker containers running on the host.

@olivielpeau
Copy link
Member

Hi @sdwr98 and thanks for reporting this issue, there is indeed a potential race condition in the current Agent when the collector takes some time to start (generally because of queries to a service discovery backend).

I've opened 2 PRs (DataDog/jmxfetch#135 and #3306) to address this issue, they should hopefully fix it. We'll work on getting these fixes merged and released for the next minor version of the Agent.

@olivielpeau olivielpeau added this to the 5.13.0 milestone Apr 17, 2017
@olivielpeau
Copy link
Member

Should be fixed in the latest release (5.13.0), closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants