Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes IndexError from killing agent when no process table returned #3043

Merged
merged 1 commit into from
Dec 2, 2016

Conversation

ahamilton55
Copy link
Contributor

Note: Please remember to review the Datadog Contribution Guidelines
if you have not yet done so.

What does this PR do?

The pull request moves the removal of the process table header into the try block above to prevent an IndexError from crashing the process. The general standard error will catch the IndexError and not require an additional error class

Motivation

We had a system that filled its disk so the output of the process table returned was empty and cause the process to fail. Other errors are caught when running the ps command so I figured why not catch this one at the same time if ps command succeeds but nothing is returned.

Testing Guidelines

I looked at the testing but I couldn't any tests that test this file. If I could get a pointer of where to look then I'd be happy to try to add in a test.

Additional Notes

Error from the logs:

2016-11-23 01:08:05 UTC | ERROR | dd.collector | checks.collector(logger.py:23) | Uncaught exception while running run
Traceback (most recent call last):
  File "/opt/datadog-agent/agent/utils/logger.py", line 20, in wrapper
    result = func(*args, **kwargs)
  File "/opt/datadog-agent/agent/checks/collector.py", line 349, in run
    processes = sys_checks['processes'].check(self.agentConfig)
  File "/opt/datadog-agent/agent/checks/system/unix.py", line 557, in check
    del processLines[0]  # Removes the headers
IndexError: list assignment index out of range
2016-11-23 01:08:05 UTC | ERROR | dd.collector | collector(agent.py:428) | Uncaught error running the Agent
Traceback (most recent call last):
  File "/opt/datadog-agent/agent/agent.py", line 424, in <module>
    sys.exit(main())
  File "/opt/datadog-agent/agent/agent.py", line 367, in main
    agent.start(foreground=True)
  File "/opt/datadog-agent/agent/daemon.py", line 175, in start
    self.run()
  File "/opt/datadog-agent/agent/agent.py", line 198, in run
    configs_reloaded=self.reload_configs_flag)
  File "/opt/datadog-agent/agent/utils/logger.py", line 20, in wrapper
    result = func(*args, **kwargs)
  File "/opt/datadog-agent/agent/checks/collector.py", line 349, in run
    processes = sys_checks['processes'].check(self.agentConfig)
  File "/opt/datadog-agent/agent/checks/system/unix.py", line 557, in check
    del processLines[0]  # Removes the headers
IndexError: list assignment index out of range

- Places process table header deletion in a try block to prevent an IndexError
  from occuring
@masci masci merged commit be37ae6 into DataDog:master Dec 2, 2016
@olivielpeau olivielpeau added this to the 5.10.2 milestone Dec 6, 2016
degemer added a commit that referenced this pull request Dec 21, 2016
* master: (53 commits)
  [nginx] Update example config
  [service_discovery] Add a Zookeeper service discovery implementation.
  [aggregator] if sample rate is bad, fix it but still parse tags. (#3073)
  [yarn] whitelist authorized application_tags
  Alex poe/update jmx with refresh beans (#3068)
  [config] Fix `_is_affirmative` when passed argument is `None` (#3063)
  Send all configured tags with process checks. (#2976)
  fix flake8 errors
  [flare] ignore whitespace before proxy credentials
  [core] add a switch to disable profiling, but still use developer mode (#2898)
  [tests] allow tests to use the additional_checksd parameter (#3056)
  [service_discovery][jmx] trying to pick-up JMX changes with SD. (#3010)
  [install_script] Make `dd-agent` group of `datadog.conf` (#3036)
  [postgres] Allow disable postgresql.database_size (#3035)
  [core] Fixes IndexError for process lookup (#3043)
  remove warning message leaking password strings (#3053)
  trap psutil.NoSuchProcess exception (#3052)
  Fix grammar and casing in exception text (#3050)
  allow override of kubelet host with KUBERNETES_KUBELET_HOST env var
  [service discovery] properly handle config reload for removed containers
  ...
@masci masci modified the milestones: 5.10.2, 5.12.0 Jan 24, 2017
derekwbrown added a commit that referenced this pull request Mar 9, 2017
Original submission moved the deletion of list index zero inside the 'try'
block, with the net effect that it would be skipped on any exception.  The
exception that's actually occurring is SubprocessOutputEmptyError.  So,
catch all exceptions and return a failure, rather than attempting to parse
potentially malformed data returned in an exceptoin case.

Original PR #3043
derekwbrown added a commit that referenced this pull request Mar 9, 2017
Original submission moved the deletion of list index zero inside the 'try'
block, with the net effect that it would be skipped on any exception.  The
exception that's actually occurring is SubprocessOutputEmptyError.  So,
catch all exceptions and return a failure, rather than attempting to parse
potentially malformed data returned in an exceptoin case.

Original PR #3043
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants