-
Notifications
You must be signed in to change notification settings - Fork 998
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug(snapshot) : Do not preempt inside OnDbChange issue #829 #868
Conversation
adiholden
commented
Feb 22, 2023
- Serializer encode db index of entries
- Do not encode db index inside consume channel
- Allow push serialized data to channel from main snapshot loop and from journal callback (push may preempt)
Signed-off-by: adi_holden <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets use the new helpers from helio EnterFiberAtomicSection
and FiberAtomicGuard
in the snapshotting loop that iterates over other snapshots to assert no suspensions are made
src/server/rdb_save.cc
Outdated
@@ -681,6 +686,7 @@ error_code RdbSerializer::FlushToSink(io::Sink* s) { | |||
// interrupt point. | |||
RETURN_ON_ERR(s->Write(bytes)); | |||
mem_buf_.ConsumeInput(bytes.size()); | |||
last_entry_db_index_ = kInvalidDbId; // After every flash we should write the DB index again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets add that we need to do this because the blobs in the channel are interleaved and multiple savers can correspond to a single writer (in case of single file rdb snapshot)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
flash -> flush. (flash is 📸 ).
src/server/rdb_save.cc
Outdated
// SELECTDB is serialized inside journal writer. In rdb loader when parsing journal blob | ||
// the journal reader parses the entry db index, and we dont use the main flow of | ||
// updating the current db index. Unless we change the flow in the loader, we must | ||
// update last_entry_db_index_ to invalid so that the next snapshot entry to be serialized | ||
// will update the db index. | ||
last_entry_db_index_ = kInvalidDbId; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the journal db numbering and rdb db numbering are fully independent, right? Then it seems like we don't have to reset last_entry_db_index_?
// TODO: investigate why a single byte gets stuck and does not arrive to replica | ||
for (unsigned i = 10; i > 1; i--) | ||
CHECK(!default_serializer_->SendFullSyncCut()); | ||
FlushDefaultBuffer(true); | ||
CHECK(!serializer_->SendFullSyncCut()); | ||
PushSerializedToChannel(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh that reminded me we still have this issue 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OMG, maybe we do not? I suggest removing this loop and see if the issue remains.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still have bug here
src/server/rdb_save.cc
Outdated
uint64_t expire_ms) { | ||
uint64_t expire_ms, DbIndex dbid) { | ||
SelectDb(dbid); | ||
last_entry_db_index_ = dbid; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is SelectDb used anywhere else?
anyway, I think this line should be moved to SelectDb -line 261
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true this is left overs from my implementation before some fix, now its not relevant. will move inside
src/server/rdb_save.cc
Outdated
@@ -701,17 +707,20 @@ io::Bytes RdbSerializer::PrepareFlush() { | |||
return mem_buf_.InputBuffer(); | |||
} | |||
|
|||
error_code RdbSerializer::WriteJournalEntries(absl::Span<const journal::Entry> entries) { | |||
error_code RdbSerializer::WriteJournalEntries(const journal::Entry& entry) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now it becomes singular WriteJournalEntry
src/server/snapshot.h
Outdated
// Return if flushed. | ||
bool FlushDefaultBuffer(bool force); | ||
// Push serializer's internal buffer to channel. | ||
// Push regradless of buffer size if force is true. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
regradless -> regardless
Signed-off-by: adi_holden <[email protected]>
Signed-off-by: adi_holden <[email protected]>
You didn't add the new FiberAtomicGuard, but we can do it separately when pulling new helio |