-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachtest: cdc/mixed-versions failed #87251
Comments
Something is definitely off here. This test failed 3/5 times on master, and it wasn't as flaky before. I'll investigate further, but pinging @cockroachdb/cdc in the meantime for awareness. |
cc @cockroachdb/cdc |
roachtest.cdc/mixed-versions failed with artifacts on master @ c080f7ac2f5be11aedd787dceba77cca7df18a16:
Parameters: |
roachtest.cdc/mixed-versions failed with artifacts on master @ b0f13c9bbef3e6628471d887672be7c7658f6511:
Parameters: |
roachtest.cdc/mixed-versions failed with artifacts on master @ e5917605f751c569ddd90062250a40cafd7ccf30:
Parameters: |
roachtest.cdc/mixed-versions failed with artifacts on master @ e39111b2e714375faa0facc05e51e8f619a55b21:
Parameters: |
roachtest.cdc/mixed-versions failed with artifacts on master @ a82711442c65cf14489c55041b45b11a1e38415b:
Parameters: Same failure on other branches
|
After some more testing, if the updates are applied when nodes are not being restarted, no fingerprint mismatch errors are observed. This indicates that the issue is related to retrying the SQL operations in the fingerprint validator. I don't have the answer as to why retrying is problematic here, and I'll do some more investigation, but I believe this shouldn't be a release blocker, as the changefeed events we get are fine. |
roachtest.cdc/mixed-versions failed with artifacts on master @ bc2e47da0523b347c28cf024707e80cd35d6c98a:
Parameters: Same failure on other branches
|
@renatolabs gonna assign it to you since its test code related |
roachtest.cdc/mixed-versions failed with artifacts on master @ 2cfa6b1779ead01508c04e228bf72b4e0e96d98c:
Parameters: Same failure on other branches
|
roachtest.cdc/mixed-versions failed with artifacts on master @ 4780ee15194e7d22b661ac92254b13cd2b71dcb1:
Parameters: Same failure on other branches
|
roachtest.cdc/mixed-versions failed with artifacts on master @ 801bfc62afd7128be180e3396d21a1e0b2daa227:
Parameters: Same failure on other branches
|
roachtest.cdc/mixed-versions failed with artifacts on master @ e786cba2b137a671d3846cf7a33e7b9dea2854e6:
Parameters: Same failure on other branches
|
roachtest.cdc/mixed-versions failed with artifacts on master @ ff6961ef1741766b5cbe0b64ebf754e82343eb00:
Parameters: Same failure on other branches
|
roachtest.cdc/mixed-versions failed with artifacts on master @ 84384b50c023dd4c05fff76af85a6975f5d2b0ab:
Parameters: Same failure on other branches
|
roachtest.cdc/mixed-versions failed with artifacts on master @ 0eaeeb773474716753781289788fdd087fb9b166:
Parameters: Same failure on other branches
|
roachtest.cdc/mixed-versions failed with artifacts on master @ 0eaeeb773474716753781289788fdd087fb9b166:
Parameters: Same failure on other branches
|
This commit updates the fingerprint validator (and its use in the `cdc/mixed-versions` test) to ignore duplicated events received by the validator. A previously implicit assumption of the validator is that any events that it receives are either not duplicated, or -- if they are duplicated -- they are within the previous resolved timestamp and the current resolved timestamp. However, that assumption is not justified by the changefeed guarantees and depends on how frequently `resolved` events are emitted and how often the changefeed checkpoints. In the specific case of the `cdc/mixed-versions` roachtest, it was possible for the changefeed to start from an old checkpoint (older than the last received `resolved` timestamp), causing it to re-emit old events that happened way before the previously observed resolved event. As a consequence, when the validator applies the update associated with that event, there is a mismatch with state of the original table as of the update's timestamp, as the fingerprint validator relies on the fact that updates are applied in order. To fix the issue, we now skip events that happen before the timestamp of the previous `resolved` event received. In addition, the caller can also tell the validator to verify that such out-of-order messages received by the validator have indeed been previously seen; if not, that would represent a violation of the changefeed's guarantees. Fixes: cockroachdb#87251. Release note: None
roachtest.cdc/mixed-versions failed with artifacts on master @ d090ac9e42263b3fe8ec94d156bed056d06bedac:
Parameters: Same failure on other branches
|
This commit updates the fingerprint validator (and its use in the `cdc/mixed-versions` test) to ignore duplicated events received by the validator. A previously implicit assumption of the validator is that any events that it receives are either not duplicated, or -- if they are duplicated -- they are within the previous resolved timestamp and the current resolved timestamp. However, that assumption is not justified by the changefeed guarantees and depends on how frequently `resolved` events are emitted and how often the changefeed checkpoints. In the specific case of the `cdc/mixed-versions` roachtest, it was possible for the changefeed to start from an old checkpoint (older than the last received `resolved` timestamp), causing it to re-emit old events that happened way before the previously observed resolved event. As a consequence, when the validator applies the update associated with that event, there is a mismatch with state of the original table as of the update's timestamp, as the fingerprint validator relies on the fact that updates are applied in order. To fix the issue, we now skip events that happen before the timestamp of the previous `resolved` event received. In addition, the caller can also tell the validator to verify that such out-of-order messages received by the validator have indeed been previously seen; if not, that would represent a violation of the changefeed's guarantees. Fixes: cockroachdb#87251. Release note: None
87128: lint: Add commit/PR body linter for epic and issue refs r=nickvigilante a=nickvigilante Before: Commits and PRs would inconsistently contain references to issues. Sometimes the author would forget to add it. Some authors might not think to add them. And there were no epic references provided. Why: Adding issue and epic references to PRs and commits provides traceability and context for the revenue teams, product, management and engineers. Now: A new GitHub Action workflow checks for references to issues the PR closes or informs and epics the PR is part of. It looks for references in the PR body and in commit messages and, if it doesn't find what it expects, it fails the check. This PR is a continuation of #77654. Fixes #77376 Release note: None Release justification: Adding linter for better epic tracking 89332: roachtest: ignore duplicated events in fingerprint validator r=srosenberg a=renatolabs This commit updates the fingerprint validator (and its use in the `cdc/mixed-versions` test) to ignore duplicated events received by the validator. A previously implicit assumption of the validator is that any events that it receives are either not duplicated, or -- if they are duplicated -- they are within the previous resolved timestamp and the current resolved timestamp. However, that assumption is not justified by the changefeed guarantees and depends on how frequently `resolved` events are emitted and how often the changefeed checkpoints. In the specific case of the `cdc/mixed-versions` roachtest, it was possible for the changefeed to start from an old checkpoint (older than the last received `resolved` timestamp), causing it to re-emit old events that happened way before the previously observed resolved event. As a consequence, when the validator applies the update associated with that event, there is a mismatch with state of the original table as of the update's timestamp, as the fingerprint validator relies on the fact that updates are applied in order. To fix the issue, we now skip events that happen before the timestamp of the previous `resolved` event received. In addition, the caller can also tell the validator to verify that such out-of-order messages received by the validator have indeed been previously seen; if not, that would represent a violation of the changefeed's guarantees. Fixes: #87251. Release note: None 89504: sql: fix panic in DROP ROLE when schemas have the same name r=ajwerner a=rafiss fixes #89486 Release note (bug fix): Fix a crash that could occur when dropping a role that owned two schemas with the same name in different databases. The bug was introduced in v22.1.0. 89516: bazel,ci: check generated code and docs are up-to-date in CI r=rail a=rickystewart When I created the file `build/bazelutil/checked_in_genfiles.txt` it was meant to contain an exhaustive list of all checked-in Go code, to be used by CI. In reality this file never had an exhaustive list and is not updated when new checked-in generated files are added. These days we have the `pkg/gen` infrastructure which *does* keep an exhaustive list of checked-in generated code, so the file is redundant anyway. Here I delete the file and just directly run `pkg/gen` for checking whether the generated code is up-to-date. Closes #88744. Release note: None Co-authored-by: Nick Vigilante <[email protected]> Co-authored-by: Renato Costa <[email protected]> Co-authored-by: Rafi Shamim <[email protected]> Co-authored-by: Ricky Stewart <[email protected]>
This commit updates the fingerprint validator (and its use in the `cdc/mixed-versions` test) to ignore duplicated events received by the validator. A previously implicit assumption of the validator is that any events that it receives are either not duplicated, or -- if they are duplicated -- they are within the previous resolved timestamp and the current resolved timestamp. However, that assumption is not justified by the changefeed guarantees and depends on how frequently `resolved` events are emitted and how often the changefeed checkpoints. In the specific case of the `cdc/mixed-versions` roachtest, it was possible for the changefeed to start from an old checkpoint (older than the last received `resolved` timestamp), causing it to re-emit old events that happened way before the previously observed resolved event. As a consequence, when the validator applies the update associated with that event, there is a mismatch with state of the original table as of the update's timestamp, as the fingerprint validator relies on the fact that updates are applied in order. To fix the issue, we now skip events that happen before the timestamp of the previous `resolved` event received. In addition, the caller can also tell the validator to verify that such out-of-order messages received by the validator have indeed been previously seen; if not, that would represent a violation of the changefeed's guarantees. Fixes: #87251. Release note: None
This commit updates the fingerprint validator (and its use in the `cdc/mixed-versions` test) to ignore duplicated events received by the validator. A previously implicit assumption of the validator is that any events that it receives are either not duplicated, or -- if they are duplicated -- they are within the previous resolved timestamp and the current resolved timestamp. However, that assumption is not justified by the changefeed guarantees and depends on how frequently `resolved` events are emitted and how often the changefeed checkpoints. In the specific case of the `cdc/mixed-versions` roachtest, it was possible for the changefeed to start from an old checkpoint (older than the last received `resolved` timestamp), causing it to re-emit old events that happened way before the previously observed resolved event. As a consequence, when the validator applies the update associated with that event, there is a mismatch with state of the original table as of the update's timestamp, as the fingerprint validator relies on the fact that updates are applied in order. To fix the issue, we now skip events that happen before the timestamp of the previous `resolved` event received. In addition, the caller can also tell the validator to verify that such out-of-order messages received by the validator have indeed been previously seen; if not, that would represent a violation of the changefeed's guarantees. Fixes: #87251. Release note: None
In recent work related to writing `cdc/mixed-versions` and debugging its failure (cockroachdb#87251), we introduced new functionality in the `FingerprintValidator` related to validating the ordering of messages received, particularly with respect to `resolved` events. However, that logic is duplicated as it is already implemented in the `OrderValidator`, which the roachtest already used. This removes timestamp validation logic from `FingerprintValidator`; the ordering validator should be used if the test intends to perform that validation. Epic: None. Release note: None.
In recent work related to writing `cdc/mixed-versions` and debugging its failure (cockroachdb#87251), we introduced new functionality in the `FingerprintValidator` related to validating the ordering of messages received, particularly with respect to `resolved` events. However, that logic is duplicated as it is already implemented in the `OrderValidator`, which the roachtest already used. This removes timestamp validation logic from `FingerprintValidator`; the ordering validator should be used if the test intends to perform that validation. Epic: None. Release note: None.
89934: cdctest: simplify FingerprintValidator r=srosenberg a=renatolabs In recent work related to writing `cdc/mixed-versions` and debugging its failure (#87251), we introduced new functionality in the `FingerprintValidator` related to validating the ordering of messages received, particularly with respect to `resolved` events. However, that logic is duplicated as it is already implemented in the `OrderValidator`, which the roachtest already used. This removes timestamp validation logic from `FingerprintValidator`; the ordering validator should be used if the test intends to perform that validation. Epic: None. Release note: None. Co-authored-by: Renato Costa <[email protected]>
roachtest.cdc/mixed-versions failed with artifacts on master @ b316a5ed5fe7253d113174d9d95ddebf1143b4e4:
Parameters:
ROACHTEST_cloud=gce
,ROACHTEST_cpu=4
,ROACHTEST_ssd=0
Help
See: roachtest README
See: How To Investigate (internal)
This test on roachdash | Improve this report!
Jira issue: CRDB-19236
Epic CRDB-11732
The text was updated successfully, but these errors were encountered: