Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug(snapshot) : Do not preempt inside OnDbChange issue #829 #868

Merged
merged 3 commits into from
Feb 26, 2023
Merged

Conversation

adiholden
Copy link
Collaborator

  1. Serializer encode db index of entries
  2. Do not encode db index inside consume channel
  3. Allow push serialized data to channel from main snapshot loop and from journal callback (push may preempt)

@adiholden adiholden changed the title bug(snapshot) : Do not preempt inside OnDbChange isuee #829 bug(snapshot) : Do not preempt inside OnDbChange issue #829 Feb 22, 2023
Copy link
Contributor

@dranikpg dranikpg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets use the new helpers from helio EnterFiberAtomicSection and FiberAtomicGuard in the snapshotting loop that iterates over other snapshots to assert no suspensions are made

@@ -681,6 +686,7 @@ error_code RdbSerializer::FlushToSink(io::Sink* s) {
// interrupt point.
RETURN_ON_ERR(s->Write(bytes));
mem_buf_.ConsumeInput(bytes.size());
last_entry_db_index_ = kInvalidDbId; // After every flash we should write the DB index again.
Copy link
Contributor

@dranikpg dranikpg Feb 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets add that we need to do this because the blobs in the channel are interleaved and multiple savers can correspond to a single writer (in case of single file rdb snapshot)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flash -> flush. (flash is 📸 ).

Comment on lines 718 to 723
// SELECTDB is serialized inside journal writer. In rdb loader when parsing journal blob
// the journal reader parses the entry db index, and we dont use the main flow of
// updating the current db index. Unless we change the flow in the loader, we must
// update last_entry_db_index_ to invalid so that the next snapshot entry to be serialized
// will update the db index.
last_entry_db_index_ = kInvalidDbId;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the journal db numbering and rdb db numbering are fully independent, right? Then it seems like we don't have to reset last_entry_db_index_?

Comment on lines 160 to +163
// TODO: investigate why a single byte gets stuck and does not arrive to replica
for (unsigned i = 10; i > 1; i--)
CHECK(!default_serializer_->SendFullSyncCut());
FlushDefaultBuffer(true);
CHECK(!serializer_->SendFullSyncCut());
PushSerializedToChannel(true);
Copy link
Contributor

@dranikpg dranikpg Feb 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh that reminded me we still have this issue 😅

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OMG, maybe we do not? I suggest removing this loop and see if the issue remains.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still have bug here

uint64_t expire_ms) {
uint64_t expire_ms, DbIndex dbid) {
SelectDb(dbid);
last_entry_db_index_ = dbid;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is SelectDb used anywhere else?
anyway, I think this line should be moved to SelectDb -line 261

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true this is left overs from my implementation before some fix, now its not relevant. will move inside

@@ -701,17 +707,20 @@ io::Bytes RdbSerializer::PrepareFlush() {
return mem_buf_.InputBuffer();
}

error_code RdbSerializer::WriteJournalEntries(absl::Span<const journal::Entry> entries) {
error_code RdbSerializer::WriteJournalEntries(const journal::Entry& entry) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now it becomes singular WriteJournalEntry

// Return if flushed.
bool FlushDefaultBuffer(bool force);
// Push serializer's internal buffer to channel.
// Push regradless of buffer size if force is true.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

regradless -> regardless

Signed-off-by: adi_holden <[email protected]>
Signed-off-by: adi_holden <[email protected]>
@dranikpg
Copy link
Contributor

You didn't add the new FiberAtomicGuard, but we can do it separately when pulling new helio

@adiholden adiholden merged commit 4b44fb6 into main Feb 26, 2023
@romange romange deleted the fix_829 branch March 1, 2023 15:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants