Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

staging-v22.2.18: release-22.2: rangefeed: fix premature checkpoint due to intent resolution race #118633

Merged

Conversation

blathers-crl[bot]
Copy link

@blathers-crl blathers-crl bot commented Feb 2, 2024

Backport 2/2 commits from #118477 on behalf of @erikgrinaker.

/cc @cockroachdb/release


Only the 2 last commits should be reviewed here, the earlier commits in this PR are from the following prerequisite backports:

This backport includes new RPC protocol additions that are necessary to fix the bug. See discussion in 23.2 backport: #118413.

This backport had no significant conflicts.


Backport 2/2 commits from #117612. Also pull in parts of #118469 and #118265.

Release justification: fixes a bug which could cause changefeeds to omit events in some scenarios.

/cc @cockroachdb/release


It was possible for rangefeeds to emit a premature checkpoint, before all writes below its timestamp had been emitted. This in turn would cause changefeeds to not emit these write events at all.

This could happen because PushTxn may return a false ABORTED status for a transaction that has in fact been committed, if the transaction record has been GCed (after resolving all intents). The timestamp cache does not retain sufficient information to disambiguate a committed transaction from an aborted one in this case, so it pessimistically
assumes an abort (see Replica.CanCreateTxnRecord and batcheval.SynthesizeTxnFromMeta).

However, the rangefeed txn pusher trusted this ABORTED status, ignoring the pending txn intents and allowing the resolved timestamp to advance past them before emitting the committed intents. This can lead to the following scenario:

  • A rangefeed is running on a lagging follower.
  • A txn writes an intent, which is replicated to the follower.
  • The closed timestamp advances past the intent.
  • The txn commits and resolves the intent at the original write timestamp, then GCs its txn record. This is not yet applied on the follower.
  • The rangefeed pushes the txn to advance its resolved timestamp.
  • The txn is GCed, so the push returns ABORTED (it can't know whether the txn was committed or aborted after its record is GCed).
  • The rangefeed disregards the "aborted" txn and advances the resolved timestamp, emitting a checkpoint.
  • The follower applies the resolved intent and emits an event below the checkpoint, violating the checkpoint guarantee.
  • The changefeed sees an event below its frontier and drops it, never emitting these events at all.

This patch fixes the bug by submitting a barrier command to the leaseholder which waits for all past and ongoing writes (including intent resolution) to complete and apply, and then waits for the local replica to apply the barrier as well. This ensures any committed intent resolution will be applied and emitted before the transaction is removed from resolved timestamp tracking.

Resolves #104309.
Epic: none
Release note (bug fix): fixed a bug where a changefeed could omit events in rare cases, logging the error "cdc ux violation: detected timestamp ... that is less or equal to the local frontier". This can happen if a rangefeed runs on a follower replica that lags significantly behind the leaseholder, a transaction commits and removes its transaction record before its intent resolution is applied on the follower, the follower's closed timestamp has advanced past the transaction commit timestamp, and the rangefeed attempts to push the transaction to a new timestamp (at least 10 seconds after the transaction began). This may cause the rangefeed to prematurely emit a checkpoint before emitting writes at lower timestamps, which in turn may cause the changefeed to drop these events entirely, never emitting them.


Release justification:

It was possible for rangefeeds to emit a premature checkpoint, before
all writes below its timestamp had been emitted. This in turn would
cause changefeeds to not emit these write events at all.

This could happen because `PushTxn` may return a false `ABORTED` status
for a transaction that has in fact been committed, if the transaction
record has been GCed (after resolving all intents). The timestamp cache
does not retain sufficient information to disambiguate a committed
transaction from an aborted one in this case, so it pessimistically
assumes an abort (see `Replica.CanCreateTxnRecord` and
`batcheval.SynthesizeTxnFromMeta`).

However, the rangefeed txn pusher trusted this `ABORTED` status,
ignoring the pending txn intents and allowing the resolved timestamp to
advance past them before emitting the committed intents. This can lead to
the following scenario:

- A rangefeed is running on a lagging follower.
- A txn writes an intent, which is replicated to the follower.
- The closed timestamp advances past the intent.
- The txn commits and resolves the intent at the original write timestamp,
  then GCs its txn record. This is not yet applied on the follower.
- The rangefeed pushes the txn to advance its resolved timestamp.
- The txn is GCed, so the push returns ABORTED (it can't know whether the
  txn was committed or aborted after its record is GCed).
- The rangefeed disregards the "aborted" txn and advances the resolved
  timestamp, emitting a checkpoint.
- The follower applies the resolved intent and emits an event below
  the checkpoint, violating the checkpoint guarantee.
- The changefeed sees an event below its frontier and drops it, never
  emitting these events at all.

This patch fixes the bug by submitting a barrier command to the
leaseholder which waits for all past and ongoing writes (including
intent resolution) to complete and apply, and then waits for the local
replica to apply the barrier as well. This ensures any committed intent
resolution will be applied and emitted before the transaction is removed
from resolved timestamp tracking.

Epic: none
Release note (bug fix): fixed a bug where a changefeed could omit events
in rare cases, logging the error "cdc ux violation: detected timestamp
... that is less or equal to the local frontier". This can happen if a
rangefeed runs on a follower replica that lags significantly behind the
leaseholder, a transaction commits and removes its transaction record
before its intent resolution is applied on the follower, the follower's
closed timestamp has advanced past the transaction commit timestamp, and
the rangefeed attempts to push the transaction to a new timestamp (at
least 10 seconds after the transaction began). This may cause the
rangefeed to prematurely emit a checkpoint before emitting writes at
lower timestamps, which in turn may cause the changefeed to drop these
events entirely, never emitting them.
@blathers-crl blathers-crl bot force-pushed the blathers/backport-staging-v22.2.18-118477 branch from cb22102 to 9c9ac22 Compare February 2, 2024 09:16
@blathers-crl blathers-crl bot requested a review from a team as a code owner February 2, 2024 09:16
@blathers-crl blathers-crl bot added blathers-backport This is a backport that Blathers created automatically. O-robot Originated from a bot. labels Feb 2, 2024
@erikgrinaker erikgrinaker merged commit 6980bf2 into staging-v22.2.18 Feb 2, 2024
3 checks passed
@erikgrinaker erikgrinaker deleted the blathers/backport-staging-v22.2.18-118477 branch February 2, 2024 09:17
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blathers-backport This is a backport that Blathers created automatically. O-robot Originated from a bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants