-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
testutils/testcluster: TestManualReplication failed #136801
Comments
testutils/testcluster.TestManualReplication failed on master @ 49cff91f3501494deaf038671bc643c194a0e3ca:
Parameters:
|
testutils/testcluster.TestManualReplication failed on master @ 8c44b55e124927e7fce8bb8c0276e43019a9d5ec:
Parameters:
|
testutils/testcluster.TestManualReplication failed on master @ f9df57e2bebd963d10ffd7fa52e4d37cf01b80df:
Parameters:
|
testutils/testcluster.TestManualReplication failed on master @ bc6d6e05a7c0f9ffd8103740239fdbc83fa78e3f:
Parameters:
|
testutils/testcluster.TestManualReplication failed on master @ 71fa055c4e4206522e371372e4eea9bc965e9d13:
Parameters:
|
testutils/testcluster.TestManualReplication failed on master @ b5d57fb09ba9c767a13cec561f77415c40a09d33:
Parameters:
|
Can't get this to repro after stressing it 3000 times, despite the somewhat frequent occurrence of the failures. I see that all these failures take 9+ seconds to run, while locally the test passes in ~2 seconds. In the logs of the failures, I see the test usually hangs for ~5 seconds finally ouputting:
The test fails right after that. Perhaps the test hangs for long enough that the lease to no longer be valid. Still trying to determine what is causing the hanging though. |
Maybe I'm lucky but I got two repros within seconds of running,
|
I intersected a bunch of failed runs on the common metamorphic settings; one commonality that is suspect is,
After disabling it via [1] #137080 |
Okay looks like I was just unlucky. Tried it again with Reproing it a couple more times I see the same |
Handing over to @cockroachdb/kv to see what's up with raft leader fortification. Setting
|
testutils/testcluster.TestManualReplication failed on master @ 0b4d620740733ec61cf50ca26d19814299d91f8e:
Parameters:
|
When we transfer a lease from N1 to N2, N2 might not immediately know about the lease transfer, and might still think that N1 holds the lease. With leader leases however, since sometimes it takes time until store liveness grants support, the test might run into timing issues where N2 thinks that N1 has the lease, but the lease is UNUSABLE since now() is so close to the lease min_expiration time. Also, sometimes N2 can't determine the lease validity all together. This commit fixes this by wrapping the lease enquiry after the lease transfer by a succeeds soon. Fixes: cockroachdb#136801 Release Note: None
139238: testcluster: deflake TestManualReplication r=iskettaneh a=iskettaneh When we transfer a lease from N1 to N2, N2 might not immediately know about the lease transfer, and might still think that N1 holds the lease. With leader leases however, since sometimes it takes time until store liveness grants support, the test might run into timing issues where N2 thinks that N1 has the lease, but the lease is UNUSABLE since now() is so close to the lease min_expiration time. Also, sometimes N2 can't determine the lease validity all together. This commit fixes this by wrapping the lease enquiry after the lease transfer by a succeeds soon. Couldn't reproduce the bug after more than 10,000 attempts. Fixes: #136801 Release Note: None Co-authored-by: Ibrahim Kettaneh <[email protected]>
When we transfer a lease from N1 to N2, N2 might not immediately know about the lease transfer, and might still think that N1 holds the lease. With leader leases however, since sometimes it takes time until store liveness grants support, the test might run into timing issues where N2 thinks that N1 has the lease, but the lease is UNUSABLE since now() is so close to the lease min_expiration time. Also, sometimes N2 can't determine the lease validity all together. This commit fixes this by wrapping the lease enquiry after the lease transfer by a succeeds soon. Fixes: cockroachdb#136801 Release note: None
testutils/testcluster.TestManualReplication failed on master @ a23be6bce928c3e08074a15815b3c67a657bb40e:
Parameters:
attempt=1
run=23
shard=1
Help
See also: How To Investigate a Go Test Failure (internal)
This test on roachdash | Improve this report!
Jira issue: CRDB-45258
The text was updated successfully, but these errors were encountered: