Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

testcluster: deflake TestManualReplication #139238

Merged
merged 1 commit into from
Jan 16, 2025

Conversation

iskettaneh
Copy link
Contributor

@iskettaneh iskettaneh commented Jan 16, 2025

When we transfer a lease from N1 to N2, N2 might not immediately know about the lease transfer, and might still think that N1 holds the lease. With leader leases however, since sometimes it takes time until store liveness grants support, the test might run into timing issues where N2 thinks that N1 has the lease, but the lease is UNUSABLE since now() is so close to the lease min_expiration time. Also, sometimes N2 can't determine the lease validity all together.

This commit fixes this by wrapping the lease enquiry after the lease transfer by a succeeds soon.

Couldn't reproduce the bug after more than 10,000 attempts.

Fixes: #136801

Release Note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@iskettaneh iskettaneh requested a review from arulajmani January 16, 2025 18:56
@iskettaneh iskettaneh marked this pull request as ready for review January 16, 2025 18:57
Copy link
Collaborator

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 1 files at r1, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @iskettaneh)


pkg/testutils/testcluster/testcluster_test.go line 153 at r1 (raw file):

	// the hint, since it's guaranteed that the old lease holder has applied the
	// new lease.
	testutils.SucceedsSoon(t, func() error {

nit: consider adding a comment explaining why we have wrapped this in a succeeds soon. Specifically, that N2 may not have applied the lease, so it may return an error in FindRangeLeaseHolder.

When we transfer a lease from N1 to N2, N2 might not immediately know
about the lease transfer, and might still think that N1 holds the lease.
With leader leases however, since sometimes it takes time until store
liveness grants support, the test might run into timing issues where N2
thinks that N1 has the lease, but the lease is UNUSABLE since now() is
so close to the lease min_expiration time. Also, sometimes N2 can't
determine the lease validity all together.

This commit fixes this by wrapping the lease enquiry after the lease
transfer by a succeeds soon.

Fixes: cockroachdb#136801

Release note: None
@iskettaneh
Copy link
Contributor Author

TFTR!

bors r+

@craig craig bot merged commit 521d415 into cockroachdb:master Jan 16, 2025
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

testutils/testcluster: TestManualReplication failed
3 participants