Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Linux] Fix system-probe enablement conditions #336

Merged
merged 11 commits into from
Mar 30, 2021
18 changes: 10 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -163,7 +163,7 @@ The following variables are available for live processes:

#### System probe

The system probe is configured under the `network_config` variable. Any variables nested underneath are written to the `system-probe.yaml`.
The system probe is configured under the `system_probe_config` variable. Any variables nested underneath are written to the `system-probe.yaml`, in the `system_probe_config` section.

[Network Performance Monitoring][7] (NPM) is configured under the `network_config` variable. Any variables nested underneath are written to the `system-probe.yaml`, in the `network_config` section.

Expand All @@ -183,13 +183,15 @@ network_config:
enabled: true
```

Once modification is complete, follow the steps below:
**Note**: This configuration works with Agent 6.24.1+ and 7.24.1+. For older Agent versions, refer to [the public documentation][8] on how to enable system-probe.

On Linux, once this modification is complete, follow the steps below if you installed an Agent version older than 6.18.0 or 7.18.0:
Copy link
Contributor Author

@KSerrania KSerrania Mar 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this because on later versions, this should not be needed since the datadog-agent-sysprobe service is tied to the datadog-agent service.

Another option would be to automate these steps, but I'm not sure it's worth spending time on it given that this only affects a small number of old versions.


1. Start the system-probe: `sudo service datadog-agent-sysprobe start` **Note**: If the service wrapper is not available on your system, run this command instead: `sudo initctl start datadog-agent-sysprobe`.
2. [Restart the Agent][8]: `sudo service datadog-agent restart`.
2. [Restart the Agent][9]: `sudo service datadog-agent restart`.
3. Enable the system-probe to start on boot: `sudo service enable datadog-agent-sysprobe`.

For manual setup, refer to the [NPM][9] documentation.
For manual setup, refer to the [NPM][8] documentation.

#### Agent v5

Expand Down Expand Up @@ -334,7 +336,7 @@ To downgrade to a prior version of the Agent:

Below are some sample playbooks to assist you with using the Datadog Ansible role.

The following example sends data to Datadog US (default), enables logs, and configures a few checks.
The following example sends data to Datadog US (default), enables logs, NPM and configures a few checks.

```yml
- hosts: servers
Expand Down Expand Up @@ -403,7 +405,7 @@ The following example sends data to Datadog US (default), enables logs, and conf
version: 1.11.0
datadog-postgres:
action: remove
system_probe_config:
network_config:
enabled: true
```

Expand Down Expand Up @@ -530,6 +532,6 @@ For more details, see [Critical Bug in Uninstaller for Datadog Agent 6.14.0 and
[5]: https://github.com/DataDog/integrations-core
[6]: https://docs.datadoghq.com/infrastructure/process/
[7]: https://docs.datadoghq.com/network_performance_monitoring/
[8]: https://docs.datadoghq.com/agent/guide/agent-commands/#restart-the-agent
[9]: https://docs.datadoghq.com/network_performance_monitoring/installation/?tab=agent#setup
[8]: https://docs.datadoghq.com/network_performance_monitoring/installation/?tab=agent#setup
[9]: https://docs.datadoghq.com/agent/guide/agent-commands/#restart-the-agent
[10]: https://app.datadoghq.com/help/agent_fix
4 changes: 4 additions & 0 deletions ci_test/install_agent_6.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,10 @@
env: dev
trace.concentrator:
extra_aggregators: version
system_probe_config:
sysprobe_socket: /opt/datadog-agent/run/sysprobe.sock
network_config:
enabled: true
datadog_checks:
process:
init_config:
Expand Down
4 changes: 4 additions & 0 deletions ci_test/install_agent_7.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,10 @@
env: dev
trace.concentrator:
extra_aggregators: version
system_probe_config:
sysprobe_socket: /opt/datadog-agent/run/sysprobe.sock
network_config:
enabled: true
datadog_checks:
process:
init_config:
Expand Down
6 changes: 2 additions & 4 deletions defaults/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,9 @@
role_version: 4.7.1

# default system-probe.yaml options
system_probe_config:
enabled: false
system_probe_config: {}

network_config:
enabled: false
network_config: {}

# define if the datadog-agent services should be enabled
datadog_enabled: yes
Expand Down
6 changes: 6 additions & 0 deletions handlers/main.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
---

- name: restart datadog-agent-sysprobe
service:
name: datadog-agent-sysprobe
state: restarted
when: datadog_enabled and datadog_sysprobe_enabled and not ansible_check_mode and not ansible_facts.os_family == "Windows"

- name: restart datadog-agent
service:
name: datadog-agent
Expand Down
3 changes: 2 additions & 1 deletion manual_tests/test_6_full.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,14 @@
datadog_api_key: "123456"
datadog_agent_allow_downgrade: true
system_probe_config:
enabled: true
source_excludes:
"*":
- 8301
dest_excludes:
"*":
- 8301
network_config:
enabled: true
datadog_config:
tags: "mytag0, mytag1"
log_level: INFO
Expand Down
3 changes: 2 additions & 1 deletion manual_tests/test_7_full.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,14 @@
datadog_api_key: "123456"
datadog_agent_allow_downgrade: true
system_probe_config:
enabled: true
source_excludes:
"*":
- 8301
dest_excludes:
"*":
- 8301
network_config:
enabled: true
datadog_config:
tags: "mytag0, mytag1"
log_level: INFO
Expand Down
74 changes: 56 additions & 18 deletions tasks/agent-linux.yml
Original file line number Diff line number Diff line change
@@ -1,8 +1,19 @@
---
- name: populate service facts
- name: Populate service facts
service_facts:

- name: add "{{ datadog_user }}" user to additional groups
- name: Set before 6/7.24.1 flag
set_fact:
datadog_before_7241: "{{ datadog_major is defined and datadog_minor is defined and datadog_bugfix is defined
and datadog_major | int < 8
and (datadog_minor | int < 24 or (datadog_minor | int == 24 and datadog_bugfix | int < 1)) }}"

- name: Set before 6/7.18.0 flag
set_fact:
datadog_before_7180: "{{ datadog_major is defined and datadog_minor is defined
and datadog_major | int < 8 and datadog_minor | int < 18 }}"

- name: Add "{{ datadog_user }}" user to additional groups
user: name="{{ datadog_user }}" groups="{{ datadog_additional_groups }}" append=yes
when: datadog_additional_groups | default([], true) | length > 0
notify: restart datadog-agent
Expand Down Expand Up @@ -78,40 +89,55 @@
mode: 0640
owner: "root"
group: "{{ datadog_group }}"
notify:
"{% if datadog_before_7180 %}restart datadog-agent-sysprobe{% else %}restart datadog-agent{% endif %}"

- name: Ensure datadog-agent is running
service:
name: datadog-agent
state: started
enabled: yes
when: not datadog_skip_running_check and datadog_enabled and not ansible_check_mode

- name: set system probe installed
- name: Set system probe installed
set_fact:
datadog_sysprobe_installed: "{{ ansible_facts.services['datadog-agent-sysprobe'] is defined
or ansible_facts.services['datadog-agent-sysprobe.service'] is defined }}"
when: not datadog_skip_running_check

- name: set system probe enabled
# Before 6/7.24.1, system_probe_config controls the system-probe service
# datadog_minor is only defined when a specific Agent version is given
# (see tasks/parse-version.yml)
- name: Set system probe enabled (before 6/7.24.1)
set_fact:
datadog_sysprobe_enabled: "{{ system_probe_config is defined
and 'enabled' in (system_probe_config | default({}, true))
and system_probe_config['enabled']
and datadog_sysprobe_installed }}"
when: not datadog_skip_running_check
and datadog_before_7241

- name: Ensure datadog-agent-sysprobe is running if enabled and installed
# Since 6/7.24.1, setting enabled: true in network_config is enough to start the system-probe service:
# https://docs.datadoghq.com/network_monitoring/performance/setup/?tab=agent#setup
- name: Set system probe enabled (since 6/7.24.1)
set_fact:
datadog_sysprobe_enabled: "{{
((system_probe_config is defined
and 'enabled' in (system_probe_config | default({}, true))
and system_probe_config['enabled'])
or (network_config is defined
and 'enabled' in (network_config | default({}, true))
and network_config['enabled']))
and datadog_sysprobe_installed }}"
when: not datadog_skip_running_check
and (not datadog_before_7241)

- name: Ensure datadog-agent is running
service:
name: datadog-agent-sysprobe
name: datadog-agent
state: started
enabled: yes
when: not datadog_skip_running_check and datadog_enabled and not ansible_check_mode and datadog_sysprobe_enabled
when: not datadog_skip_running_check and datadog_enabled and not ansible_check_mode

- name: Ensure datadog-agent-sysprobe is stopped if disabled or not installed
- name: Ensure datadog-agent-sysprobe is running if enabled and installed
service:
name: datadog-agent-sysprobe
state: stopped
enabled: no
when: not datadog_skip_running_check and (not datadog_enabled or not datadog_sysprobe_enabled) and datadog_sysprobe_installed
state: started
enabled: yes
when: not datadog_skip_running_check and datadog_enabled and not ansible_check_mode and datadog_sysprobe_enabled

- name: Ensure datadog-agent, datadog-agent-process and datadog-agent-trace are not running
service:
Expand All @@ -124,6 +150,18 @@
- datadog-agent-process
- datadog-agent-trace

# Stop system-probe manually on Agent versions < 6/7.18, as it was not tied
# to the main Agent service: https://github.com/DataDog/datadog-agent/pull/4883
- name: Ensure datadog-agent-sysprobe is stopped if disabled or not installed (before 6/7.18.0)
service:
name: datadog-agent-sysprobe
state: stopped
enabled: no
when: not datadog_skip_running_check
and (not datadog_enabled or not datadog_sysprobe_enabled)
and datadog_before_7180
and datadog_sysprobe_installed

- name: Ensure datadog-agent-security is not running
service:
name: datadog-agent-security
Expand Down
2 changes: 1 addition & 1 deletion templates/system-probe.yaml.j2
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Managed by Ansible

{% if system_probe_config is defined and system_probe_config|length > 0 -%}
{% if system_probe_config is defined and system_probe_config | default({}, true) | length > 0 -%}
system_probe_config:
{# The "first" option is only supported by jinja 2.10+
which is not present on older systems (CentOS 7, Debian 8, etc.)
Expand Down