-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Re-enable resource leak tests, change timeout, improve message #4454
Re-enable resource leak tests, change timeout, improve message #4454
Conversation
This pull request does not have a backport label. Could you fix it @fearful-symmetry? 🙏
NOTE: |
Pinging @elastic/elastic-agent (Team:Elastic-Agent) |
…aky-test-handle-leaks
In case you are missing the discussion in Slack, there was a change in 8.14.0-SNAPSHOT in Fleet that changed the output name from |
|
…arch type (#179218) ## Summary Related to #178857 and #177927 It seems that using output id instead of "default" in full agent policy had a higher impact than expected. There are a few places where agent relies on the name "default". ([This](elastic/elastic-agent#4454) and [this](elastic/elastic-agent#4453) pr) Because of this, doing a partial revert, to keep using "default" for elasticsearch output type to avoid breaking change. However, for other types, using the output id. This will fix the original issue of remote output health reporting. I think it is a rarely used feature to use a non-elasticsearch output as default, so it shouldn't have a big impact to not use "default" output name for those. To verify: - create a remote es output and set as default (both data and monitoring) - create an agent policy that uses default output - enroll an agent - expect that the agent sends system and elastic-agent metrics/logs to remote es - verify that the remote es health badge shows up on UI - set elasticsearch output back as default - verify that the agent policy has it as "default" in outputs section <img width="704" alt="image" src="https://github.com/elastic/kibana/assets/90178898/ab46b00d-efc2-49e1-ad7f-9acd44b2a9e5"> <img width="1251" alt="image" src="https://github.com/elastic/kibana/assets/90178898/a07c0d78-9126-43d9-bd0e-a4df193d7e78"> <img width="1791" alt="image" src="https://github.com/elastic/kibana/assets/90178898/868a054b-2cae-42f3-8f60-f2bff3b29efd"> <img width="715" alt="image" src="https://github.com/elastic/kibana/assets/90178898/721cd809-5f97-47e5-bf99-19f542d8ff83"> ### Checklist - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios
allHealthy = false | ||
} | ||
} | ||
return allHealthy && foundApache && foundSystem | ||
}, runner.healthCheckTime, runner.healthCheckRefreshTime, "install never became healthy") | ||
}, runner.healthCheckTime, runner.healthCheckRefreshTime, "install never became healthy: components did not return a healthy state: %s", compDebugName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: printing out the entire last agent status (in a structured way) may be more useful for debugging
…arch type (elastic#179218) ## Summary Related to elastic#178857 and elastic#177927 It seems that using output id instead of "default" in full agent policy had a higher impact than expected. There are a few places where agent relies on the name "default". ([This](elastic/elastic-agent#4454) and [this](elastic/elastic-agent#4453) pr) Because of this, doing a partial revert, to keep using "default" for elasticsearch output type to avoid breaking change. However, for other types, using the output id. This will fix the original issue of remote output health reporting. I think it is a rarely used feature to use a non-elasticsearch output as default, so it shouldn't have a big impact to not use "default" output name for those. To verify: - create a remote es output and set as default (both data and monitoring) - create an agent policy that uses default output - enroll an agent - expect that the agent sends system and elastic-agent metrics/logs to remote es - verify that the remote es health badge shows up on UI - set elasticsearch output back as default - verify that the agent policy has it as "default" in outputs section <img width="704" alt="image" src="https://github.com/elastic/kibana/assets/90178898/ab46b00d-efc2-49e1-ad7f-9acd44b2a9e5"> <img width="1251" alt="image" src="https://github.com/elastic/kibana/assets/90178898/a07c0d78-9126-43d9-bd0e-a4df193d7e78"> <img width="1791" alt="image" src="https://github.com/elastic/kibana/assets/90178898/868a054b-2cae-42f3-8f60-f2bff3b29efd"> <img width="715" alt="image" src="https://github.com/elastic/kibana/assets/90178898/721cd809-5f97-47e5-bf99-19f542d8ff83"> ### Checklist - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios
What does this PR do?
Attempt to fix #4447.
This increases the timeout on the health check, and improves the error message. As to why the test is flaky, my best guess right now is that the 3-minute timeout just isn't long enough on windows. Plan is just to re-run the test a bunch of times and...see what happens.