-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DocDB] [Consensus] WAL File Copy Bug #23335
Comments
This seems to be impacting children of split tablets. Added the backport labels. |
further context - The bug could result in log GC that violates time based policy controlled by flag The above violation seems possible only on the tablet split codepath which invokes the function |
…opied Summary: When a log segment is being copied partially, we weren't setting the footer's `close_timestamp_micros` for the new segment. This could result in log GC that violates time based policy controlled by flag `time_based_wal_gc_clock_delta_usec`, as we don't prevent a GC of a potential log segment if it doesn't have the close timestamp set (note that this does not violate other log GC policies which are required for correctness in any way). `Log::GetSegmentsToGCUnlocked` -> `LogReader::GetSegmentPrefixNotIncluding` -> `LogReader::ViolatesMaxTimePolicy(segment)` wrongly infers that the age violates the max log retention policy. ``` bool LogReader::ViolatesMaxTimePolicy(const scoped_refptr<ReadableLogSegment>& segment) const { ... int64_t now = GetCurrentTimeMicros(); int64_t age_seconds = (now - segment->footer().close_timestamp_micros()) / 1000000; // this will infer wrong age because close ts is missing if (age_seconds > FLAGS_log_max_seconds_to_retain) { ... return true; } return false; } ``` and then, `Log::GetSegmentsToGCUnlocked` -> `Log::ApplyTimeRetentionPolicy` doesn't prevent this since the footer doesn't have the close ts. ``` void Log::ApplyTimeRetentionPolicy(SegmentSequence* segments_to_gc) const { int64_t now = GetCurrentTimeMicros() + FLAGS_time_based_wal_gc_clock_delta_usec; for (auto iter = segments_to_gc->begin(); iter != segments_to_gc->end(); ++iter) { const auto& segment = *iter; if (!segment->footer().has_close_timestamp_micros()) continue; // We let the segment be GC'ed if it doesn't have the field close_timestamp_micros set. int64_t age_seconds = (now - segment->footer().close_timestamp_micros()) / 1000000; if (age_seconds < wal_retention_secs()) { // Truncate the list of segments to GC here -- if this one is too new, then all later ones are // also too new. segments_to_gc->truncate(iter); break; } } } ``` The above violation seems possible only on the tablet split codepath which invokes the function `ReadableLogSegment::CopyTo`, and that too when the split op isn't the last op in the parent's log segment. This is rare since there are only certain special ops that are allowed to be appended to the log after the split op (`NO_OP`, `SNAPSHOT_OP`, `CLONE_OP`, `CHANGE_CONFIG_OP`). Additional note: The field `close_timestamp_micros` seems only relevant in the log GC codepath. so there shouldn't have been any other issues caused by this bug. This diff addresses the issue by setting the field `close_timestamp_micros` for the log segment being copied. Jira: DB-12262 Test Plan: ./yb_build.sh --cxx-test='TEST_F(LogTest, CopyUpTo) {' The above test fails without the fix. Added a new test where we attempt a leader change on the parent tablet after split, forcing the log copy path to invoke actual copy (duplication) as opposed to hard linking to the paren't log segment. ``` switch (relation) { case SegmentOpIdRelation::kOpIdBeforeSegment: return true; case SegmentOpIdRelation::kOpIdIsLast: stop = true; FALLTHROUGH_INTENDED; case SegmentOpIdRelation::kEmptySegment: FALLTHROUGH_INTENDED; case SegmentOpIdRelation::kOpIdAfterSegment: RETURN_NOT_OK(env->LinkFile(src_path, dest_path)); VLOG_WITH_PREFIX(1) << Format("Hard linked $0 to $1", src_path, dest_path); return stop; case SegmentOpIdRelation::kOpIdIsInsideAndNotLast: // Copy part of the segment up to and including max_included_op_id. RETURN_NOT_OK(segment->CopyTo( env, GetNewSegmentWritableFileOptions(), dest_path, max_included_op_id)); return true; } ``` ./yb_build.sh --cxx-test='TEST_F(TabletSplitITest, TestLogCopySetsCloseTimestampInFooter) {' -n 50 --tp 1 this test fails consistently without the fix. Reviewers: timur Reviewed By: timur Subscribers: ybase Differential Revision: https://phorge.dev.yugabyte.com/D36956
Summary: 43d4b49 [PLAT-14771]: Restrict user actions on releases page based on appropriate RBAC permissions 1473a23 [PLAT-14792]: Master and TServer nodes displays empty tool tip if nodes are unreachable Excluded: 78a2bc7 [#23114] YSQL: Refactoring the yb_pg_pgaudit.sql/.out test. da9b281 [doc][ybm] Prometheus integration (#23292) bd4874b [#13358] YSQL: Fix DDL atomicity stress test failure in tsan build edd8e3f [PLAT-14524] Undo all @JsonProperty annotations added earlier to fix APIs Excluded: 4a2657e [PLAT-14787] Implement a master tablet lb api for YW b9d2e9d [#23445]yugabyted: Node not starting with DNS name and `--secure` option f171e13 [#23447]yugabyted: Node doesn't restart with `--secure` enabled d4f036d [PLAT-14790] Region metadata is not populated when provisioned nodes are added via Node Agent when using assisted Manual Provisioning (provision_instance.py script) 71ab66f [doc][ybm] Backup and restore clarifications (#23400) 4768023 [doc] ysql_yb_bnl_batch_size flag (#23397) 3d4bc2a [#23117] YSQL: Enable ALTER VIEW in parser Excluded: 03bbbed Bumping version to 2.23.1.0 on branch master 622046d [#23335] DocDB: Set field close timestamp for the log segment being copied Excluded: f69b08f [#1999] YSQL: fix temp table INSERT ON CONFLICT Excluded: efd4cb7 [#23429] YSQL: fix INSERT ON CONFLICT TupleDesc ref leak Excluded: 9e7181f [PLAT-14785] Add REST APIs for job scheduler (auto-master failover) d56903c [PLAT-14850]API Token authentication loops through the users and checks token against each of these. Test Plan: Jenkins: rebase: pg15-cherrypicks Reviewers: jason, tfoucher Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D37201
@zj11224 , Thanks for reporting the issue, Basava has now handled this bug. We will backport to stable branches soon. |
…segment being copied Summary: Original commit: 622046d / D36956 When a log segment is being copied partially, we weren't setting the footer's `close_timestamp_micros` for the new segment. This could result in log GC that violates time based policy controlled by flag `time_based_wal_gc_clock_delta_usec`, as we don't prevent a GC of a potential log segment if it doesn't have the close timestamp set (note that this does not violate other log GC policies which are required for correctness in any way). `Log::GetSegmentsToGCUnlocked` -> `LogReader::GetSegmentPrefixNotIncluding` -> `LogReader::ViolatesMaxTimePolicy(segment)` wrongly infers that the age violates the max log retention policy. ``` bool LogReader::ViolatesMaxTimePolicy(const scoped_refptr<ReadableLogSegment>& segment) const { ... int64_t now = GetCurrentTimeMicros(); int64_t age_seconds = (now - segment->footer().close_timestamp_micros()) / 1000000; // this will infer wrong age because close ts is missing if (age_seconds > FLAGS_log_max_seconds_to_retain) { ... return true; } return false; } ``` and then, `Log::GetSegmentsToGCUnlocked` -> `Log::ApplyTimeRetentionPolicy` doesn't prevent this since the footer doesn't have the close ts. ``` void Log::ApplyTimeRetentionPolicy(SegmentSequence* segments_to_gc) const { int64_t now = GetCurrentTimeMicros() + FLAGS_time_based_wal_gc_clock_delta_usec; for (auto iter = segments_to_gc->begin(); iter != segments_to_gc->end(); ++iter) { const auto& segment = *iter; if (!segment->footer().has_close_timestamp_micros()) continue; // We let the segment be GC'ed if it doesn't have the field close_timestamp_micros set. int64_t age_seconds = (now - segment->footer().close_timestamp_micros()) / 1000000; if (age_seconds < wal_retention_secs()) { // Truncate the list of segments to GC here -- if this one is too new, then all later ones are // also too new. segments_to_gc->truncate(iter); break; } } } ``` The above violation seems possible only on the tablet split codepath which invokes the function `ReadableLogSegment::CopyTo`, and that too when the split op isn't the last op in the parent's log segment. This is rare since there are only certain special ops that are allowed to be appended to the log after the split op (`NO_OP`, `SNAPSHOT_OP`, `CLONE_OP`, `CHANGE_CONFIG_OP`). Additional note: The field `close_timestamp_micros` seems only relevant in the log GC codepath. so there shouldn't have been any other issues caused by this bug. This diff addresses the issue by setting the field `close_timestamp_micros` for the log segment being copied. Jira: DB-12262 Test Plan: ./yb_build.sh --cxx-test='TEST_F(LogTest, CopyUpTo) {' The above test fails without the fix. Added a new test where we attempt a leader change on the parent tablet after split, forcing the log copy path to invoke actual copy (duplication) as opposed to hard linking to the paren't log segment. ``` switch (relation) { case SegmentOpIdRelation::kOpIdBeforeSegment: return true; case SegmentOpIdRelation::kOpIdIsLast: stop = true; FALLTHROUGH_INTENDED; case SegmentOpIdRelation::kEmptySegment: FALLTHROUGH_INTENDED; case SegmentOpIdRelation::kOpIdAfterSegment: RETURN_NOT_OK(env->LinkFile(src_path, dest_path)); VLOG_WITH_PREFIX(1) << Format("Hard linked $0 to $1", src_path, dest_path); return stop; case SegmentOpIdRelation::kOpIdIsInsideAndNotLast: // Copy part of the segment up to and including max_included_op_id. RETURN_NOT_OK(segment->CopyTo( env, GetNewSegmentWritableFileOptions(), dest_path, max_included_op_id)); return true; } ``` ./yb_build.sh --cxx-test='TEST_F(TabletSplitITest, TestLogCopySetsCloseTimestampInFooter) {' -n 50 --tp 1 this test fails consistently without the fix. Reviewers: timur Reviewed By: timur Subscribers: ybase Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D37178
…g segment being copied Summary: Original commit: 622046d / D36956 When a log segment is being copied partially, we weren't setting the footer's `close_timestamp_micros` for the new segment. This could result in log GC that violates time based policy controlled by flag `time_based_wal_gc_clock_delta_usec`, as we don't prevent a GC of a potential log segment if it doesn't have the close timestamp set (note that this does not violate other log GC policies which are required for correctness in any way). `Log::GetSegmentsToGCUnlocked` -> `LogReader::GetSegmentPrefixNotIncluding` -> `LogReader::ViolatesMaxTimePolicy(segment)` wrongly infers that the age violates the max log retention policy. ``` bool LogReader::ViolatesMaxTimePolicy(const scoped_refptr<ReadableLogSegment>& segment) const { ... int64_t now = GetCurrentTimeMicros(); int64_t age_seconds = (now - segment->footer().close_timestamp_micros()) / 1000000; // this will infer wrong age because close ts is missing if (age_seconds > FLAGS_log_max_seconds_to_retain) { ... return true; } return false; } ``` and then, `Log::GetSegmentsToGCUnlocked` -> `Log::ApplyTimeRetentionPolicy` doesn't prevent this since the footer doesn't have the close ts. ``` void Log::ApplyTimeRetentionPolicy(SegmentSequence* segments_to_gc) const { int64_t now = GetCurrentTimeMicros() + FLAGS_time_based_wal_gc_clock_delta_usec; for (auto iter = segments_to_gc->begin(); iter != segments_to_gc->end(); ++iter) { const auto& segment = *iter; if (!segment->footer().has_close_timestamp_micros()) continue; // We let the segment be GC'ed if it doesn't have the field close_timestamp_micros set. int64_t age_seconds = (now - segment->footer().close_timestamp_micros()) / 1000000; if (age_seconds < wal_retention_secs()) { // Truncate the list of segments to GC here -- if this one is too new, then all later ones are // also too new. segments_to_gc->truncate(iter); break; } } } ``` The above violation seems possible only on the tablet split codepath which invokes the function `ReadableLogSegment::CopyTo`, and that too when the split op isn't the last op in the parent's log segment. This is rare since there are only certain special ops that are allowed to be appended to the log after the split op (`NO_OP`, `SNAPSHOT_OP`, `CLONE_OP`, `CHANGE_CONFIG_OP`). Additional note: The field `close_timestamp_micros` seems only relevant in the log GC codepath. so there shouldn't have been any other issues caused by this bug. This diff addresses the issue by setting the field `close_timestamp_micros` for the log segment being copied. Jira: DB-12262 Test Plan: ./yb_build.sh --cxx-test='TEST_F(LogTest, CopyUpTo) {' The above test fails without the fix. Added a new test where we attempt a leader change on the parent tablet after split, forcing the log copy path to invoke actual copy (duplication) as opposed to hard linking to the paren't log segment. ``` switch (relation) { case SegmentOpIdRelation::kOpIdBeforeSegment: return true; case SegmentOpIdRelation::kOpIdIsLast: stop = true; FALLTHROUGH_INTENDED; case SegmentOpIdRelation::kEmptySegment: FALLTHROUGH_INTENDED; case SegmentOpIdRelation::kOpIdAfterSegment: RETURN_NOT_OK(env->LinkFile(src_path, dest_path)); VLOG_WITH_PREFIX(1) << Format("Hard linked $0 to $1", src_path, dest_path); return stop; case SegmentOpIdRelation::kOpIdIsInsideAndNotLast: // Copy part of the segment up to and including max_included_op_id. RETURN_NOT_OK(segment->CopyTo( env, GetNewSegmentWritableFileOptions(), dest_path, max_included_op_id)); return true; } ``` ./yb_build.sh --cxx-test='TEST_F(TabletSplitITest, TestLogCopySetsCloseTimestampInFooter) {' -n 50 --tp 1 this test fails consistently without the fix. Reviewers: timur Reviewed By: timur Subscribers: ybase Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D37177
…mpInFooter Summary: Test `TabletSplitITest.TestLogCopySetsCloseTimestampInFooter` was put up to assert that the footer timestamp was being rightly set for log segments when duplicating the segment for a split child tablet. The test was trying to make the following assertion ``` for (const auto& entry : entries_copy_result.entries) { if (entry->has_replicate()) { has_replicated_entries = true; break; } } ASSERT_EQ(has_replicated_entries, segment->footer().has_close_timestamp_micros()) << "T " << peer->tablet_id() << " P " << peer->permanent_uuid() << ": Expected valid close timestamp for segment with replicated entries."; } ``` where we assert that the segment would have a valid footer timestamp only when the `segment->HasFooter()` and the segment has non-zero replicate entries. But from the code, we can see that there exists cases where a segment has a valid footer and has 0 replicate entries ``` if (!footer_builder_.has_min_replicate_index()) { VLOG_WITH_PREFIX(1) << "Writing a segment without any REPLICATE message. Segment: " << active_segment_->path(); } ``` Additionally, detective failure logs confirm this to be the case ``` ../../src/yb/integration-tests/tablet-split-itest.cc:966: Failure Expected equality of these values: has_replicated_entries Which is: false segment->footer().has_close_timestamp_micros() Which is: true ``` The diff fixes the test issue by asserting that the log segment has a valid footer close timestamp when `segment->HasFooter()` is true. This assertion should hold since the footer should always have a valid close timestamp. Jira: DB-13553 Test Plan: Jenkins: compile-only ./yb_build.sh release --cxx-test='TEST_F(TabletSplitITest, TestLogCopySetsCloseTimestampInFooter) {' -n 50 --tp 1 Note: The modified test rightly fails when source code changes fixing #23335 are reverted, which is the required behavior. Reviewers: timur, rthallam Reviewed By: timur Subscribers: ybase Differential Revision: https://phorge.dev.yugabyte.com/D39615
…ySetsCloseTimestampInFooter Summary: Original commit: 4785122 / D39615 Test `TabletSplitITest.TestLogCopySetsCloseTimestampInFooter` was put up to assert that the footer timestamp was being rightly set for log segments when duplicating the segment for a split child tablet. The test was trying to make the following assertion ``` for (const auto& entry : entries_copy_result.entries) { if (entry->has_replicate()) { has_replicated_entries = true; break; } } ASSERT_EQ(has_replicated_entries, segment->footer().has_close_timestamp_micros()) << "T " << peer->tablet_id() << " P " << peer->permanent_uuid() << ": Expected valid close timestamp for segment with replicated entries."; } ``` where we assert that the segment would have a valid footer timestamp only when the `segment->HasFooter()` and the segment has non-zero replicate entries. But from the code, we can see that there exists cases where a segment has a valid footer and has 0 replicate entries ``` if (!footer_builder_.has_min_replicate_index()) { VLOG_WITH_PREFIX(1) << "Writing a segment without any REPLICATE message. Segment: " << active_segment_->path(); } ``` Additionally, detective failure logs confirm this to be the case ``` ../../src/yb/integration-tests/tablet-split-itest.cc:966: Failure Expected equality of these values: has_replicated_entries Which is: false segment->footer().has_close_timestamp_micros() Which is: true ``` The diff fixes the test issue by asserting that the log segment has a valid footer close timestamp when `segment->HasFooter()` is true. This assertion should hold since the footer should always have a valid close timestamp. Jira: DB-13553 Test Plan: Jenkins: compile-only ./yb_build.sh release --cxx-test='TEST_F(TabletSplitITest, TestLogCopySetsCloseTimestampInFooter) {' -n 50 --tp 1 Note: The modified test rightly fails when source code changes fixing #23335 are reverted, which is the required behavior. Reviewers: timur, rthallam Reviewed By: rthallam Subscribers: ybase Differential Revision: https://phorge.dev.yugabyte.com/D39731
…ySetsCloseTimestampInFooter Summary: Original commit: 4785122 / D39615 Test `TabletSplitITest.TestLogCopySetsCloseTimestampInFooter` was put up to assert that the footer timestamp was being rightly set for log segments when duplicating the segment for a split child tablet. The test was trying to make the following assertion ``` for (const auto& entry : entries_copy_result.entries) { if (entry->has_replicate()) { has_replicated_entries = true; break; } } ASSERT_EQ(has_replicated_entries, segment->footer().has_close_timestamp_micros()) << "T " << peer->tablet_id() << " P " << peer->permanent_uuid() << ": Expected valid close timestamp for segment with replicated entries."; } ``` where we assert that the segment would have a valid footer timestamp only when the `segment->HasFooter()` and the segment has non-zero replicate entries. But from the code, we can see that there exists cases where a segment has a valid footer and has 0 replicate entries ``` if (!footer_builder_.has_min_replicate_index()) { VLOG_WITH_PREFIX(1) << "Writing a segment without any REPLICATE message. Segment: " << active_segment_->path(); } ``` Additionally, detective failure logs confirm this to be the case ``` ../../src/yb/integration-tests/tablet-split-itest.cc:966: Failure Expected equality of these values: has_replicated_entries Which is: false segment->footer().has_close_timestamp_micros() Which is: true ``` The diff fixes the test issue by asserting that the log segment has a valid footer close timestamp when `segment->HasFooter()` is true. This assertion should hold since the footer should always have a valid close timestamp. Jira: DB-13553 Test Plan: Jenkins: compile-only ./yb_build.sh release --cxx-test='TEST_F(TabletSplitITest, TestLogCopySetsCloseTimestampInFooter) {' -n 50 --tp 1 Note: The modified test rightly fails when source code changes fixing #23335 are reverted, which is the required behavior. Reviewers: timur, rthallam Reviewed By: rthallam Subscribers: ybase Differential Revision: https://phorge.dev.yugabyte.com/D39732
…etsCloseTimestampInFooter Summary: Original commit: 4785122 / D39615 Test `TabletSplitITest.TestLogCopySetsCloseTimestampInFooter` was put up to assert that the footer timestamp was being rightly set for log segments when duplicating the segment for a split child tablet. The test was trying to make the following assertion ``` for (const auto& entry : entries_copy_result.entries) { if (entry->has_replicate()) { has_replicated_entries = true; break; } } ASSERT_EQ(has_replicated_entries, segment->footer().has_close_timestamp_micros()) << "T " << peer->tablet_id() << " P " << peer->permanent_uuid() << ": Expected valid close timestamp for segment with replicated entries."; } ``` where we assert that the segment would have a valid footer timestamp only when the `segment->HasFooter()` and the segment has non-zero replicate entries. But from the code, we can see that there exists cases where a segment has a valid footer and has 0 replicate entries ``` if (!footer_builder_.has_min_replicate_index()) { VLOG_WITH_PREFIX(1) << "Writing a segment without any REPLICATE message. Segment: " << active_segment_->path(); } ``` Additionally, detective failure logs confirm this to be the case ``` ../../src/yb/integration-tests/tablet-split-itest.cc:966: Failure Expected equality of these values: has_replicated_entries Which is: false segment->footer().has_close_timestamp_micros() Which is: true ``` The diff fixes the test issue by asserting that the log segment has a valid footer close timestamp when `segment->HasFooter()` is true. This assertion should hold since the footer should always have a valid close timestamp. Jira: DB-13553 Test Plan: Jenkins: compile-only ./yb_build.sh release --cxx-test='TEST_F(TabletSplitITest, TestLogCopySetsCloseTimestampInFooter) {' -n 50 --tp 1 Note: The modified test rightly fails when source code changes fixing #23335 are reverted, which is the required behavior. Reviewers: timur, rthallam Reviewed By: rthallam Subscribers: ybase Differential Revision: https://phorge.dev.yugabyte.com/D39733
Jira Link: DB-12262
Description
Is there an error in the ReadableLogSegment::CopyTo function at log_util.cc:391, where the closing timestamp for copying the source file is reset. Why change the original WAL file data instead of changing the close timestamp of the new file generated after copying?
Is there an error at log_util.cc:391 where it should be footer instead of footer_.
This could cause the GC to think the log is too old and delete it, but the file could be a very recent file instead because the close timestamp was set incorrectly when it was copied?
Can you answer my query, thanks.
Issue Type
kind/bug
Warning: Please confirm that this issue does not contain any sensitive information
The text was updated successfully, but these errors were encountered: