-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix two additional flaky test sources in endtoend tests #11743
Conversation
The logic here is racy since multiple endtoend tests can create the file at around the same time and allocate the same base port. This then leads to flaky failing endtoend tests since not all ports can be used. Even with the listening check later on, this is still racy because the listen check might run at a later point still trying to allocate the same port for the same service in different endtoend test. This copies the Go filelock implementation, but only for unix systems since that's the only supported platform for Vitess. Go has a proposal open to make filelock available, but that is still pending in golang/go#33974. Once that is resolved, we can remove our copy of the implementation. Signed-off-by: Dirkjan Bussink <[email protected]>
Next to the previous commit which fixes the race base port computation, we also need to avoid collisions between ephemeral outgoing ports to 127.0.0.1 and the listening ones. The problem is that we set 1024 as the first ephemeral port, but that means we can collide and trigger flaky tests with all the tests that allocate a server port as well. Given we start allocating server ports now starting at 6700 for endtoend tests and we allocate in blocks of 200, we have room for 75+ concurrent endtoend tests which should totally suffice. We still have a range of 40+ ephemeral ports which should still be sufficient as well. Signed-off-by: Dirkjan Bussink <[email protected]>
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
If a new flag is being introduced:
If a workflow is added or modified:
Bug fixes
Non-trivial changes
New/Existing features
Backward compatibility
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why we vendored the filelock package but assuming there's good reason.
🙇 THANK YOU for this! This is something that's been on my list for a while but I had never gotten to. ❤️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
We can merge once my one comment is addressed.
…) (vitessio#1331) Signed-off-by: Dirkjan Bussink <[email protected]> Signed-off-by: Dirkjan Bussink <[email protected]>
Description
This change fixes two more sources of flaky tests in endtoend tests. The first once is that base port allocation for tests is racy and it that it can allocate the same base port for two parallel endtoend tests.
The second one is that we can also collide with ephemeral client ports since we allow those to overlap with the listening ports. This can cause flaky tests as well if a component can't actually be started in endtoend tests.
See also the commit messages for more details.
Related Issue(s)
This is a follow up to #11707 which helped some CI hangs and timeouts but it now shows real underlying issues which could be caused by the two issues this PR fixes.
Checklist