-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DocDB] Data loss when docdb_filter_intent_ht is enabled #23890
Labels
area/docdb
YugabyteDB core features
kind/bug
This issue is a bug
priority/medium
Medium priority issue
Comments
es1024
added
area/docdb
YugabyteDB core features
status/awaiting-triage
Issue awaiting triage
labels
Sep 11, 2024
yugabyte-ci
added
kind/bug
This issue is a bug
priority/medium
Medium priority issue
labels
Sep 11, 2024
es1024
added a commit
that referenced
this issue
Sep 17, 2024
… min_replay_txn_start_ht Summary: This diff adds filtering for intent SST files during the transaction loading step of tablet bootstrap, using the newly introduced `min_replay_txn_start_ht`. When CDC is enabled, we persist intent SST files longer than they are otherwise needed, until CDC has streamed all transactions in the SST file and moved the retention barrier far enough ahead. This can lead to a large buildup of intent SST files which are not actually needed at bootstrap time. 8b23a4e / D35639 added changes to save `min_running_ht` periodically, and add intent SST file filtering during bootstrap time based on periodically saved values of `min_running_ht`. This can lead to data loss, if there is a transaction T such that the following is true: - T has been applied (APPLIED written to WALs) - T has intents that have been flushed to disk (this is rare but possible when CDC is disabled since in the ideal non-CDC case we never flush intents) - Changes made by T on regulardb have not been flushed - The metadata record for T is on an intent SST file whose max HT is less than min_running_ht after T apply (i.e. intentsdb flush happened between T writes and apply) - Tablet bootstrap state has been saved after T has committed These conditions will result in a `min_running_ht > T.start_time` being written to disk, and loaded during tablet bootstrap. Since regulardb changes have not been flushed, WAL replay will start from a point that includes T. However, transaction loader will not load T, because its metadata record has been excluded due to the SST file filter. This results in changes made by T being dropped, even though it has successfully committed. This change introduces a new `min_replay_txn_start_ht` and changes the intent SST file filter to be based off of periodically saved values of this new `min_replay_txn_start_ht`. `min_replay_txn_start_ht` is defined as the minimum of: - `min_running_ht` - `start_ht` of any transaction which may be read during WAL replay WAL replay begins at `bootstrap_start_op_id = min(intentsdb flushed_op_id, rocksdb flushed_op_id, retryable requests last_flushed_op_id)`. We calculate `min_replay_txn_start_ht` by maintaining a set of `(applied_op_id, start_ht)` for recently applied transactions. Transactions are added into this set when they are applied and cleaned from memory (removed from `transactions_`) and are removed when `bootstrap_start_op_id` is increased past `applied_op_id`. `min_replay_txn_start_ht` is then the minimum of `start_ht` of this set and `min_running_ht`. Since `replay_start_op_id` is only updated after flushes to disk, this ensures that any transaction whose metadata record is filtered out by the intent SST file filter will not be incorrectly loaded during WAL replay, since such a transaction would have `apply_op_id < replay_start_op_id` (the `replay_start_op_id` calculated at bootstrap time), so none of its records are read by WAL replay. **Upgrade/Rollback safety:** The `min_running_ht` field in `TabletBootstrapStatePB` was retired and a new `min_replay_txn_start_ht` field was added. There are no autoflags added because `min_replay_txn_start_ht` is only used for an optimization (intent SST file filtering) so the lack of its presence post-upgrade does not change correctness, and its presence post-rollback is simply ignored. `min_running_ht` was only used for a incorrect implementation of the optimization which was off by default, so the lack of its presence post-rollback does not harm correctness (and actually improves it if optimization was turned on) and its presence after upgrade is ignored. A different field was used for this change to ensure that values of `min_running_ht` set before upgrade are not used, since it is unsafe to use it. Jira: DB-12794 Test Plan: Added test case to reproduce the data loss scenario when filter was using `min_running_ht`: ``` ./yb_build.sh --cxx_test pgwrapper_pg_mini-test --gtest_filter PgMiniTestSingleNode.TestBootstrapOnAppliedTransactionWithIntents ``` Also confirmed that CDC stress tests stop failing after these changes. Reviewers: sergei, qhu Reviewed By: sergei Subscribers: rthallam, ybase, yql Differential Revision: https://phorge.dev.yugabyte.com/D37792
jasonyb
pushed a commit
that referenced
this issue
Sep 17, 2024
Summary: f97d7d5 [#23770] [#23797] [#23837] YSQL: Fix most non-regress tests when run with Connection Manager ee2b108 [docs] reverted PR 23909 changes (#23941) bd80f4e [#23924] DocDB: Address recent flakiness of PgGetLockStatusTest.TestGetWaitStart Excluded: f0a5db7 [#20908] YSQL: Introduce interface to optimize index non-key column updates 6556498 [#23897] xClusterDDLRepl: Support create table with partition by primary key e2b1d28 [#23363] Changing the default gflags of YSQL memory configuration. e717f43 [#23925] DocDB: Address recent regression of test PgWaitQueuesTest.MultiTabletFairness 09b7702 [#23940] YSQL: Replace copyright string from YugaByteDB to YugabyteDB add83ef [#23947] Update callhome URL to use https 69db717 Update faq page (#23704) Excluded: 063dbe5 [#23786] YSQL: yb_make_next_ddl_statement_nonincrementing Excluded: a913524 [#23859] YSQL: Remove redundant Bitmap Scan filters on partial indexes 41f5afd [PLAT-15097]: Delete System platform DB table while disabling ysql d6bbf59 [#23890] docdb: Add filtering for bootstrap intent iterators based on min_replay_txn_start_ht 0b37479 [#23770] YSQL: Stabalize TestPgExplainAnalyze#testExplainAnalyzeOptions the test with ysql connection manager Test Plan: Jenkins: rebase: pg15-cherrypicks Reviewers: jason, tfoucher Subscribers: yql Differential Revision: https://phorge.dev.yugabyte.com/D38123
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
area/docdb
YugabyteDB core features
kind/bug
This issue is a bug
priority/medium
Medium priority issue
Jira Link: DB-12794
Description
When docdb_filter_intent_ht is enabled, we attempt to filter out intent SST files by min_running_ht during transaction loading step of bootstrap, but this is too aggressive and causes data loss. This has been turned off (#23184) and needs to be fixed.
Issue Type
kind/bug
Warning: Please confirm that this issue does not contain any sensitive information
The text was updated successfully, but these errors were encountered: