Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix two additional flaky test sources in endtoend tests #11743

Merged
merged 2 commits into from
Nov 16, 2022

Conversation

dbussink
Copy link
Contributor

@dbussink dbussink commented Nov 16, 2022

Description

This change fixes two more sources of flaky tests in endtoend tests. The first once is that base port allocation for tests is racy and it that it can allocate the same base port for two parallel endtoend tests.

The second one is that we can also collide with ephemeral client ports since we allow those to overlap with the listening ports. This can cause flaky tests as well if a component can't actually be started in endtoend tests.

See also the commit messages for more details.

Related Issue(s)

This is a follow up to #11707 which helped some CI hangs and timeouts but it now shows real underlying issues which could be caused by the two issues this PR fixes.

Checklist

  • "Backport to:" labels have been added if this change should be back-ported
  • Tests were added or are not required
  • Documentation was added or is not required

The logic here is racy since multiple endtoend tests can create the file
at around the same time and allocate the same base port. This then leads
to flaky failing endtoend tests since not all ports can be used.

Even with the listening check later on, this is still racy because the
listen check might run at a later point still trying to allocate the
same port for the same service in different endtoend test.

This copies the Go filelock implementation, but only for unix systems
since that's the only supported platform for Vitess.

Go has a proposal open to make filelock available, but that is still
pending in golang/go#33974. Once that is
resolved, we can remove our copy of the implementation.

Signed-off-by: Dirkjan Bussink <[email protected]>
Next to the previous commit which fixes the race base port computation,
we also need to avoid collisions between ephemeral outgoing ports to
127.0.0.1 and the listening ones.

The problem is that we set 1024 as the first ephemeral port, but that
means we can collide and trigger flaky tests with all the tests that
allocate a server port as well.

Given we start allocating server ports now starting at 6700 for endtoend
tests and we allocate in blocks of 200, we have room for 75+ concurrent
endtoend tests which should totally suffice.

We still have a range of 40+ ephemeral ports which should still be
sufficient as well.

Signed-off-by: Dirkjan Bussink <[email protected]>
@vitess-bot
Copy link
Contributor

vitess-bot bot commented Nov 16, 2022

Review Checklist

Hello reviewers! 👋 Please follow this checklist when reviewing this Pull Request.

General

  • Ensure that the Pull Request has a descriptive title.
  • If this is a change that users need to know about, please apply the release notes (needs details) label so that merging is blocked unless the summary release notes document is included.

If a new flag is being introduced:

  • Is it really necessary to add this flag?
  • Flag names should be clear and intuitive (as far as possible)
  • Help text should be descriptive.
  • Flag names should use dashes (-) as word separators rather than underscores (_).

If a workflow is added or modified:

  • Each item in Jobs should be named in order to mark it as required.
  • If the workflow should be required, the maintainer team should be notified.

Bug fixes

  • There should be at least one unit or end-to-end test.
  • The Pull Request description should include a link to an issue that describes the bug.

Non-trivial changes

  • There should be some code comments as to why things are implemented the way they are.

New/Existing features

  • Should be documented, either by modifying the existing documentation or creating new documentation.
  • New features should have a link to a feature request issue or an RFC that documents the use cases, corner cases and test cases.

Backward compatibility

  • Protobuf changes should be wire-compatible.
  • Changes to _vt tables and RPCs need to be backward compatible.
  • vtctl command output order should be stable and awk-able.
  • RPC changes should be compatible with vitess-operator
  • If a flag is removed, then it should also be removed from VTop, if used there.

Copy link
Contributor

@rohit-nayak-ps rohit-nayak-ps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Contributor

@mattlord mattlord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why we vendored the filelock package but assuming there's good reason.

🙇 THANK YOU for this! This is something that's been on my list for a while but I had never gotten to. ❤️

go/test/endtoend/filelock/filelock.go Show resolved Hide resolved
Copy link
Member

@deepthi deepthi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!
We can merge once my one comment is addressed.

dev.env Show resolved Hide resolved
@vmg vmg merged commit 2e29fc9 into vitessio:main Nov 16, 2022
@vmg vmg deleted the dbussink/fix-more-test-races branch November 16, 2022 17:25
notfelineit pushed a commit to planetscale/vitess that referenced this pull request Nov 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants