Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected errors and warnings in the vulnerability detection E2E tests #5698

Closed
3 tasks done
Rebits opened this issue Aug 26, 2024 · 13 comments · Fixed by #5711
Closed
3 tasks done

Unexpected errors and warnings in the vulnerability detection E2E tests #5698

Rebits opened this issue Aug 26, 2024 · 13 comments · Fixed by #5711
Assignees
Labels

Comments

@Rebits
Copy link
Member

Rebits commented Aug 26, 2024

Description

Unexpected errors and warnings were reported in wazuh/wazuh#25363. It's necessary to perform a thorough research of them and include in case of they are expected in the errors to ignore dictionary in the vulnerability tests.

Evidences

Report: wazuh/wazuh#25363 (comment)

  • Errors detected:
{
    "manager1": {
        "ERROR": [],
        "CRITICAL": [],
        "WARNING": [
            "2024/08/25 19:48:46 wazuh-authd: WARNING: Duplicate name 'agent5', rejecting enrollment. Agent '002' has not been disconnected long enough to be replaced."
        ]
    },
    "manager2": {
        "ERROR": [
            "2024/08/25 19:48:44 wazuh-authd: ERROR: Could not send message through the cluster after '10' attempts."
        ],
        "CRITICAL": [],
        "WARNING": [
            "2024/08/25 19:48:35 wazuh-authd: WARNING: Could not connect to socket 'queue/cluster/c-internal.sock': Connection refused (111).",
            "2024/08/25 19:48:36 wazuh-authd: WARNING: Could not connect to socket 'queue/cluster/c-internal.sock': Connection refused (111).",
            "2024/08/25 19:48:37 wazuh-authd: WARNING: Could not connect to socket 'queue/cluster/c-internal.sock': Connection refused (111).",
            "2024/08/25 19:48:38 wazuh-authd: WARNING: Could not connect to socket 'queue/cluster/c-internal.sock': Connection refused (111).",
            "2024/08/25 19:48:39 wazuh-authd: WARNING: Could not connect to socket 'queue/cluster/c-internal.sock': Connection refused (111).",
            "2024/08/25 19:48:40 wazuh-authd: WARNING: Could not connect to socket 'queue/cluster/c-internal.sock': Connection refused (111).",
            "2024/08/25 19:48:41 wazuh-authd: WARNING: Could not connect to socket 'queue/cluster/c-internal.sock': Connection refused (111).",
            "2024/08/25 19:48:42 wazuh-authd: WARNING: Could not connect to socket 'queue/cluster/c-internal.sock': Connection refused (111).",
            "2024/08/25 19:48:43 wazuh-authd: WARNING: Could not connect to socket 'queue/cluster/c-internal.sock': Connection refused (111).",
            "2024/08/25 19:48:44 wazuh-authd: WARNING: Could not connect to socket 'queue/cluster/c-internal.sock': Connection refused (111).",
            "2024/08/25 19:48:44 wazuh-authd: WARNING: Could not connect to socket 'queue/cluster/c-internal.sock': Connection refused (111).",
            "2024/08/25 19:48:45 wazuh-authd: WARNING: Could not connect to socket 'queue/cluster/c-internal.sock': Connection refused (111).",
            "2024/08/25 19:48:46 wazuh-authd: WARNING: 9008: Duplicate name"
        ]
    },
    "agent1": {
        "ERROR": [],
        "CRITICAL": [],
        "WARNING": []
    },
    "agent3": {
        "ERROR": [],
        "CRITICAL": [],
        "WARNING": []
    },
    "agent4": {
        "ERROR": [],
        "CRITICAL": [],
        "WARNING": [
            "2024/08/25 19:38:26 wazuh-agentd: WARNING: Couldn't connect to server '172.31.0.85': 'Agent version must be lower or equal to manager version'"
        ]
    },
    "agent5": {
        "ERROR": [
            "2024/08/25 19:47:48 wazuh-agentd: ERROR: (1208): Unable to connect to enrollment service at '[172.31.9.129]:1515'",
            "2024/08/25 19:48:13 wazuh-agentd: ERROR: (1208): Unable to connect to enrollment service at '[172.31.9.129]:1515'",
            "2024/08/25 19:48:46 wazuh-agentd: ERROR: Duplicate name (from manager)",
            "2024/08/25 19:48:46 wazuh-agentd: ERROR: Unable to add agent (from manager)"
        ],
        "CRITICAL": [],
        "WARNING": []
    },
    "agent6": {
        "ERROR": [],
        "CRITICAL": [],
        "WARNING": []
    },
    "agent2": {
        "ERROR": [
            "2024/08/25 19:48:00 wazuh-agent: ERROR: (1208): Unable to connect to enrollment service at '[172.31.9.129]:1515'\r",
            "2024/08/25 19:48:44 wazuh-agent: ERROR: Cannot comunicate with master (from manager)\r",
            "2024/08/25 19:48:44 wazuh-agent: ERROR: Unable to add agent (from manager)\r"
        ],
        "CRITICAL": [],
        "WARNING": []
    }
}

Tasks

@wazuh wazuh deleted a comment Aug 26, 2024
@wazuhci wazuhci moved this to Triage in Release 4.9.0 Aug 26, 2024
@wazuhci wazuhci moved this to Backlog in Release 4.9.1 Aug 26, 2024
@wazuhci wazuhci removed this from Release 4.9.0 Aug 26, 2024
@Rebits Rebits self-assigned this Aug 26, 2024
@wazuhci wazuhci moved this from Backlog to In progress in Release 4.9.1 Aug 26, 2024
@Rebits
Copy link
Member Author

Rebits commented Aug 26, 2024

We can check that all these warnings/errors can be splitter to:

Unexpected connection error of agent5

During the registration process, the agent was successfully registered with the manager. However, a network error occurred, interrupting communication. As a result, the agent could not complete its registration. When it attempted to register again, it found that an agent with the same name was already present. The agent was successfully registered after 10 minutes.

2024/08/25 19:47:28 wazuh-agentd: ERROR: (1137): Lost connection with manager. Setting lock.
2024/08/25 19:47:28 wazuh-agentd: INFO: Closing connection to server ([172.31.9.129]:1514/tcp).
2024/08/25 19:47:28 wazuh-agentd: INFO: Trying to connect to server ([172.31.9.129]:1514/tcp).
2024/08/25 19:47:28 wazuh-agentd: ERROR: (1216): Unable to connect to '[172.31.9.129]:1514/tcp': 'Connection refused'.
2024/08/25 19:47:33 wazuh-agentd: INFO: Trying to connect to server ([172.31.9.129]:1514/tcp).
2024/08/25 19:47:33 wazuh-agentd: ERROR: (1216): Unable to connect to '[172.31.9.129]:1514/tcp': 'Connection refused'.
2024/08/25 19:47:38 wazuh-agentd: INFO: Trying to connect to server ([172.31.9.129]:1514/tcp).
2024/08/25 19:47:38 wazuh-agentd: ERROR: (1216): Unable to connect to '[172.31.9.129]:1514/tcp': 'Connection refused'.
2024/08/25 19:47:43 wazuh-agentd: INFO: Trying to connect to server ([172.31.9.129]:1514/tcp).
2024/08/25 19:47:43 wazuh-agentd: ERROR: (1216): Unable to connect to '[172.31.9.129]:1514/tcp': 'Connection refused'.
2024/08/25 19:47:48 wazuh-agentd: INFO: Trying to connect to server ([172.31.9.129]:1514/tcp).
2024/08/25 19:47:48 wazuh-agentd: ERROR: (1216): Unable to connect to '[172.31.9.129]:1514/tcp': 'Connection refused'.
2024/08/25 19:47:48 wazuh-agentd: INFO: Requesting a key from server: 172.31.9.129
2024/08/25 19:47:48 wazuh-agentd: ERROR: (1208): Unable to connect to enrollment service at '[172.31.9.129]:1515'
2024/08/25 19:47:53 wazuh-agentd: WARNING: (4101): Waiting for server reply (not started). Tried: '172.31.9.129'. Ensure that the manager version is 'v4.9.0' or higher.
2024/08/25 19:47:53 wazuh-agentd: WARNING: Unable to connect to any server.
2024/08/25 19:47:53 wazuh-agentd: INFO: Trying to connect to server ([172.31.9.129]:1514/tcp).
2024/08/25 19:47:53 wazuh-agentd: ERROR: (1216): Unable to connect to '[172.31.9.129]:1514/tcp': 'Connection refused'.
2024/08/25 19:47:58 wazuh-agentd: INFO: Trying to connect to server ([172.31.9.129]:1514/tcp).
2024/08/25 19:47:58 wazuh-agentd: ERROR: (1216): Unable to connect to '[172.31.9.129]:1514/tcp': 'Connection refused'.
2024/08/25 19:48:03 wazuh-agentd: INFO: Trying to connect to server ([172.31.9.129]:1514/tcp).
2024/08/25 19:48:03 wazuh-agentd: ERROR: (1216): Unable to connect to '[172.31.9.129]:1514/tcp': 'Connection refused'.
2024/08/25 19:48:08 wazuh-agentd: INFO: Trying to connect to server ([172.31.9.129]:1514/tcp).
2024/08/25 19:48:08 wazuh-agentd: ERROR: (1216): Unable to connect to '[172.31.9.129]:1514/tcp': 'Connection refused'.
2024/08/25 19:48:11 wazuh-modulesd:syscollector: INFO: Starting evaluation.
2024/08/25 19:48:11 wazuh-modulesd: WARNING: Process locked due to agent is offline. Waiting for connection...
2024/08/25 19:48:13 wazuh-agentd: INFO: Trying to connect to server ([172.31.9.129]:1514/tcp).
2024/08/25 19:48:13 wazuh-agentd: ERROR: (1216): Unable to connect to '[172.31.9.129]:1514/tcp': 'Connection refused'.
2024/08/25 19:48:13 wazuh-agentd: INFO: Requesting a key from server: 172.31.9.129
2024/08/25 19:48:13 wazuh-agentd: ERROR: (1208): Unable to connect to enrollment service at '[172.31.9.129]:1515'
2024/08/25 19:48:17 wazuh-logcollector: WARNING: Process locked due to agent is offline. Waiting for connection...
2024/08/25 19:48:18 wazuh-agentd: WARNING: (4101): Waiting for server reply (not started). Tried: '172.31.9.129'. Ensure that the manager version is 'v4.9.0' or higher.
2024/08/25 19:48:18 wazuh-agentd: WARNING: Unable to connect to any server.
2024/08/25 19:48:18 wazuh-agentd: INFO: Trying to connect to server ([172.31.9.129]:1514/tcp).
2024/08/25 19:48:18 wazuh-agentd: ERROR: (1216): Unable to connect to '[172.31.9.129]:1514/tcp': 'Connection refused'.
2024/08/25 19:48:23 wazuh-agentd: INFO: Trying to connect to server ([172.31.9.129]:1514/tcp).
2024/08/25 19:48:23 wazuh-agentd: ERROR: (1216): Unable to connect to '[172.31.9.129]:1514/tcp': 'Connection refused'.
2024/08/25 19:48:28 wazuh-agentd: INFO: Trying to connect to server ([172.31.9.129]:1514/tcp).
2024/08/25 19:48:28 wazuh-agentd: ERROR: (1216): Unable to connect to '[172.31.9.129]:1514/tcp': 'Connection refused'.
2024/08/25 19:48:33 wazuh-agentd: INFO: Trying to connect to server ([172.31.9.129]:1514/tcp).
2024/08/25 19:48:33 wazuh-agentd: ERROR: (1216): Unable to connect to '[172.31.9.129]:1514/tcp': 'Connection refused'.
2024/08/25 19:48:38 wazuh-agentd: INFO: Trying to connect to server ([172.31.9.129]:1514/tcp).
2024/08/25 19:48:38 wazuh-agentd: ERROR: (1216): Unable to connect to '[172.31.9.129]:1514/tcp': 'Connection refused'.
2024/08/25 19:48:38 wazuh-agentd: INFO: Requesting a key from server: 172.31.9.129
2024/08/25 19:48:44 wazuh-agentd: INFO: No authentication password provided
2024/08/25 19:48:44 wazuh-agentd: INFO: Using agent name as: agent5
2024/08/25 19:48:44 wazuh-agentd: INFO: Waiting for server reply
2024/08/25 19:48:46 wazuh-agentd: ERROR: Duplicate name (from manager)
2024/08/25 19:48:46 wazuh-agentd: ERROR: Unable to add agent (from manager)
2024/08/25 19:48:51 wazuh-agentd: WARNING: (4101): Waiting for server reply (not started). Tried: '172.31.9.129'. Ensure that the manager version is 'v4.9.0' or higher.
2024/08/25 19:48:51 wazuh-agentd: WARNING: Unable to connect to any server.
2024/08/25 19:48:51 wazuh-agentd: INFO: Trying to connect to server ([172.31.9.129]:1514/tcp).
2024/08/25 19:48:51 wazuh-agentd: INFO: (4102): Connected to the server ([172.31.9.129]:1514/tcp).
  • Cluster errors in manager2: There was produce a cluster error in manager2, making the registration of the agent2 failed. The agent was correctly registered a few seconds after the error:
2024/08/25 19:48:49 wazuh-agent: INFO: Trying to connect to server ([172.31.9.129]:1514/tcp).
2024/08/25 19:48:49 wazuh-agent: INFO: (4102): Connected to the server ([172.31.9.129]:1514/tcp).
2024/08/25 19:48:49 wazuh-agent: INFO: Server responded. Releasing lock.
2024/08/25 19:48:52 wazuh-agent: INFO: Agent is now online. Process unlocked, continuing...

Conclusion

Both errors occurred in agents registered with manager2 due to a network issue, causing cluster and enrollment errors. These errors are not considered expected, so they have been excluded from the expected errors dictionary in the test. I suggest including a brief delay between the startup of the agent and the manager to prevent similar issues in the future.
The inclusion of logic to remove defined client.keys has been discarded, as this is not typical user behavior.

@Rebits
Copy link
Member Author

Rebits commented Aug 26, 2024

@Rebits
Copy link
Member Author

Rebits commented Aug 28, 2024

On hold due to no macOS are available: wazuh/wazuh#25345

@Rebits
Copy link
Member Author

Rebits commented Sep 2, 2024

In order to test this development along with #5699 it was created the branch tmp-fixes-4.9.0 that contains both branches.

I am currently testing over 4.9.0-rc2.
Build: https://ci.wazuh.info/job/Test_e2e_system/365/

@Rebits
Copy link
Member Author

Rebits commented Sep 3, 2024

A new build was launched to include newer changes in #5699 (comment)
Buid: https://ci.wazuh.info/job/Test_e2e_system/368/

@Rebits
Copy link
Member Author

Rebits commented Sep 4, 2024

Conclusion

The current approach results in constant updates of non-critical warnings in the expected list. To address this, we've decided to remove warning messages from the unexpected lists, reducing false positive errors related to manager connections, networking, and similar issues.

Additionally, a delay was introduced between the startup of the manager and the agents. This ensures the manager has time to fully initialize before the agents start, thereby preventing potential connection errors.

@wazuhci wazuhci moved this from In progress to Pending review in Release 4.9.1 Sep 4, 2024
@wazuhci wazuhci moved this from Pending review to In review in Release 4.9.1 Sep 4, 2024
@jseg380
Copy link
Member

jseg380 commented Sep 4, 2024

LGTM

@wazuhci wazuhci moved this from In review to Pending final review in Release 4.9.1 Sep 4, 2024
@wazuhci wazuhci moved this from Pending final review to On hold in Release 4.9.1 Sep 9, 2024
@wazuhci wazuhci moved this from On hold to In progress in Release 4.9.1 Sep 16, 2024
@Rebits
Copy link
Member Author

Rebits commented Sep 16, 2024

Waiting until #5715 to tests changes in the same build

@Rebits
Copy link
Member Author

Rebits commented Sep 16, 2024

@Rebits
Copy link
Member Author

Rebits commented Sep 18, 2024

Report analysis

The evidence can be found at #5698 (comment)

Cluster connection error before the test started

We can notice in the manager2 errors in authd and db daemons related to the registrations of some agents:

2024/09/17 10:07:43 wazuh-authd: INFO: (1225): SIGNAL [(15)-(Terminated)] Received. Exit Cleaning...
2024/09/17 10:07:43 wazuh-authd: WARNING: Could not connect to socket 'queue/cluster/c-internal.sock': Connection refused (111).
2024/09/17 10:07:44 wazuh-authd: WARNING: Could not connect to socket 'queue/cluster/c-internal.sock': Connection refused (111).
2024/09/17 10:07:45 wazuh-authd: WARNING: Could not connect to socket 'queue/cluster/c-internal.sock': Connection refused (111).
2024/09/17 10:07:46 wazuh-authd: WARNING: Could not connect to socket 'queue/cluster/c-internal.sock': Connection refused (111).
2024/09/17 10:07:47 wazuh-authd: WARNING: Could not connect to socket 'queue/cluster/c-internal.sock': Connection refused (111).
2024/09/17 10:07:48 wazuh-authd: WARNING: Could not connect to socket 'queue/cluster/c-internal.sock': Connection refused (111).
2024/09/17 10:07:49 wazuh-authd: WARNING: Could not connect to socket 'queue/cluster/c-internal.sock': Connection refused (111).
2024/09/17 10:07:50 wazuh-authd: WARNING: Could not connect to socket 'queue/cluster/c-internal.sock': Connection refused (111).
2024/09/17 10:07:51 wazuh-authd: WARNING: Could not connect to socket 'queue/cluster/c-internal.sock': Connection refused (111).
2024/09/17 10:07:51 wazuh-authd: ERROR: Could not send message through the cluster after '10' attempts.
...
2024/09/17 10:08:12 wazuh-db: ERROR: There was an error assigning the groups to agent '001'
2024/09/17 10:08:12 wazuh-db: WARNING: The groups were empty right after the set for agent '001'
2024/09/17 10:08:12 wazuh-db: ERROR: There was an error assigning the groups to agent '002'
2024/09/17 10:08:12 wazuh-db: WARNING: The groups were empty right after the set for agent '002'
2024/09/17 10:08:12 wazuh-db: ERROR: There was an error assigning the groups to agent '003'
2024/09/17 10:08:12 wazuh-db: WARNING: The groups were empty right after the set for agent '003'
2024/09/17 10:08:12 wazuh-db: ERROR: There was an error assigning the groups to agent '004'
2024/09/17 10:08:12 wazuh-db: WARNING: The groups were empty right after the set for agent '004'
...
2024/09/17 10:08:18 wazuh-remoted: WARNING: (1408): Invalid ID 005 for the source ip: '172.31.13.221' (name 'unknown').
2024/09/17 10:08:23 wazuh-remoted: INFO: (1409): Authentication file changed. Updating.
2024/09/17 10:08:23 wazuh-remoted: INFO: (1410): Reading authentication keys file.
2024/09/17 10:08:46 wazuh-authd: INFO: New connection from 172.31.1.115
2024/09/17 10:08:46 wazuh-authd: INFO: Received request for a new agent (agent5) from: 172.31.1.115
2024/09/17 10:08:46 wazuh-authd: INFO: Dispatching request to master node
2024/09/17 10:08:46 wazuh-authd: INFO: Agent key generated for 'agent5' (requested by any)
2024/09/17 10:08:52 wazuh-db: WARNING: The groups were empty right after the set for agent '006'

This behavior has led to agent2 not being registered, making failing the initial tests. However, we can see that the agent in the rest of the tests registers with the manager and its vulnerabilities are being detected correctly.

These errors appear before the tests start, during the provision stages. A similar error was reported in wazuh/wazuh#20915, wazuh/wazuh#22855 and wazuh/wazuh#16464. This race condition was [discarded](see wazuh/wazuh#22855 (comment)) due to the low impact (agent finally registered).

Regarding Vulnerability E2E, it was created an issue to mitigate this issue #5746

Missing vulnerabilities

Known issues specified in wazuh/wazuh#25506

@Rebits Rebits moved this from In progress to Pending final review in Release 4.9.1 Sep 18, 2024
@wazuhci wazuhci moved this from Pending final review to In final review in Release 4.9.1 Sep 18, 2024
@juliamagan
Copy link
Member

#5746 should not be necessary, since what is intended to be fixed in the new issue are the same connection errors reported in the main comment, so this should be fixed in this issue itself.

@wazuhci wazuhci moved this from In final review to On hold in Release 4.9.1 Sep 18, 2024
@Rebits Rebits moved this from On hold to In progress in Release 4.9.1 Sep 18, 2024
@Rebits
Copy link
Member Author

Rebits commented Sep 18, 2024

In the provisioning trace, we can see the following error that could specifically explain the errors regarding the cluster

19:53:52  TASK [../roles/wazuh/ansible-wazuh-manager : Install Wazuh Manager from .deb packages] ***
19:53:52  fatal: [master]: FAILED! => {
19:53:52      "changed": false
19:53:52  }
19:53:52  
19:53:52  STDERR:
19:53:52  
19:53:52  dpkg: error: dpkg frontend lock was locked by another process with pid 14257
19:53:52  Note: removing the lock file is always wrong, and can end up damaging the
19:53:52  locked area and the entire system. See <https://wiki.debian.org/Teams/Dpkg/FAQ>.
19:53:52  
19:53:52  
19:53:52  
19:53:52  MSG:
19:53:52  
19:53:52  dpkg --force-confdef --force-confold -i /tmp/ansible-tmp-1724612019.8497238-7973-66496375523271/wazuh-manager_4.9.0-1_amd64y_x3ilkb.deb failed

It seems that the retry in the provisioning could lead to an unstable state where is more probable that unexpected errors or warnings like the specified in the issue occur.

I suggest including a retry logic in this task.

@Rebits
Copy link
Member Author

Rebits commented Sep 18, 2024

Build after applying changes in provisioning methods: https://ci.wazuh.info/job/Test_e2e_system/384

@Rebits Rebits moved this from In progress to Pending final review in Release 4.9.1 Sep 19, 2024
@wazuhci wazuhci moved this from Pending final review to In final review in Release 4.9.1 Sep 19, 2024
@wazuhci wazuhci moved this from In final review to Done in Release 4.9.1 Sep 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
No open projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants