-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Making //test/integration:hds_integration_test less flaky #4149
Conversation
Signed-off-by: Lilika Markatou <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allowing unexpected disconnects from HDS makes sense as I think we do with all fake xDS, however the increase timeouts still seem a bit scary. Is there any way to wait on things such that the timeouts aren't needed? Or should we make the timeouts impossibly large?
@mattklein123 we can go over the timeout robustness tomorrow, should we merge this to reduce test flakes meanwhile? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. SGTM. Happy to see a follow up to potentially extend the timeouts or do something a bit less fragile.
…obust (#4164) Follow up on #4149. I increased any unused timers to 100 seconds. Additionally, instead of waiting on a timer to go off, all but one tests wait on cluster stats to change. I decreased the timer wait times, reducing the test runtime from 25 seconds to 5. Exceptions: While TCP health checking, if an endpoint responds with a wrong phrase, the cluster failure stats do not change. So, the one test that examines this case still depends on timers. There are 2 tests that test hosts timing out. I set those timeouts to 0.1 seconds (vs 100). However, both tests still wait on the cluster stats to change. Risk Level: Low Testing: I run the test locally, and it passed 3000/3000 times. This work is for #1310. Signed-off-by: markatou <[email protected]>
…#1938) This is far from finished, but it reduces memory usage by ~10%. Pulling the following changes from github.com/envoyproxy/envoy: c1cc68d stats: refactoring MetricImpl without strings (envoyproxy#4190) 36809d8 fuzz: coverage profile generation instructions. (envoyproxy#4193) ba40cc9 upstream: rebuild cluster when health check config is changed (envoyproxy#4075) 05c0d52 build: use clang-6.0. (envoyproxy#4168) 01f403e thrift_proxy: introduce header transport (envoyproxy#4082) 564d256 tcp: allow connection pool callers to store protocol state (envoyproxy#4131) 3e1d643 configs: match latest API changes (envoyproxy#4185) bc6a10c Fix wrong mock function name. (envoyproxy#4187) e994c1c Bump yaml-cpp so it builds with Visual Studio 15.8 (envoyproxy#4182) 3d1325e Converting envoy configs to V2 (envoyproxy#2957) 8d0680f Add timestamp to HealthCheckEvent definition (envoyproxy#4119) 497efb9 server: handle non-EnvoyExceptions safely if thrown in constructor. (envoyproxy#4173) 6d6fafd config: strengthen validation for gRPC config sources. (envoyproxy#4171) 132302c fuzz: reduce log level when running under fuzz engine. (envoyproxy#4161) 7c04ac2 test: fix V6EmptyOptions in coverage with IPv6 enable (envoyproxy#4155) 1b2219b ci: remove deprecated bazel --batch option (envoyproxy#4166) 2db6a4c ci: update clang to version 6.0 in the Ubuntu build image. (envoyproxy#4157) 71152b7 ratelimit: Add ratelimit custom response headers (envoyproxy#4015) 3062874 ssl: make Ssl::Connection const everywhere (envoyproxy#4179) 706e262 Handle validation of non expiring tokens in jwt_authn filter (envoyproxy#4007) f06e958 fuzz: tag trivial fuzzers with no_fuzz. (envoyproxy#4156) 27fb1d3 thrift_proxy: add service name matching to router implementation (envoyproxy#4130) 8c189a5 Make over provisioning factor configurable (envoyproxy#4003) 6c08fb4 Making test less flaky (envoyproxy#4149) 592775b fuzz: bare bones HCM fuzzer. (envoyproxy#4118) For istio/istio#7912. Signed-off-by: Piotr Sikora <[email protected]>
In #4135, //test/integration:hds_integration_test failed 7/100 times. I changed it such that the upstreams are allowed to unexpectedly disconnect, and the timeouts are longer.
Testing:
I run the test locally, and it passed 1000/1000 times.
Risk Level:
Low
This work is for #1310.