Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(replication): Use a ring buffer with messages to serve replication. #1835

Merged
merged 6 commits into from
Sep 18, 2023

Conversation

royjacobson
Copy link
Contributor

@royjacobson royjacobson commented Sep 10, 2023

This is infrastructure changes needed for partial sync, but no actual replication protocol changes are implemented here.
The journal ring buffer is enabled, some functionality is introduced, and the existing replication infrastructure is migrated to use the ring buffer instead of serializing in-place.

Copy link
Collaborator

@chakaz chakaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see my comments, mostly questions because of my lack of context

#include "base/logging.h"
#include "server/journal/serializer.h"

ABSL_FLAG(int, shard_changes_log_size, 1 << 16, "The size of the circular changes log per shard");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add the work "replication" in the flag name, or at least in the description?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

happy to bike shed here :)

For reference, the equivalent redis configuration is called repl-backlog-size and it's measured in bytes (not entries)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think of it as bike-shedding, it's actually something the users might look for. So ctrl-f in the output of --helpfull or whatever can be useful to contain that word..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to shard_repl_backlog_len

src/server/journal/journal_slice.cc Outdated Show resolved Hide resolved
src/server/journal/journal_slice.cc Show resolved Hide resolved
src/server/journal/journal_slice.cc Outdated Show resolved Hide resolved
src/server/rdb_load.cc Show resolved Hide resolved
src/server/rdb_save.cc Show resolved Hide resolved
@royjacobson royjacobson force-pushed the ring_buffer_improvements branch from 5d8599d to b688080 Compare September 11, 2023 12:29
Copy link
Contributor

@dranikpg dranikpg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few cleaning comments

src/server/rdb_save.cc Show resolved Hide resolved
src/server/journal/streamer.cc Show resolved Hide resolved
@dranikpg
Copy link
Contributor

A thought I had. Instead of using RingBufferstd::string we can use two rotating byte buffers (like the streamer does) to control offset not only in entries, but also in bytes (for example a max lag of 5mb is allowed). Also, we don't need separate allocations for all entries (which are likely fairly cheap either way). The downside that memory usage is twice higher and you drop half of the entries when filling up (or you keep track with an external vector which makes it difficult)

@romange
Copy link
Collaborator

romange commented Sep 12, 2023

Why did we punt on the design review? I am curious to know if we still maintain back-pressure when replicating via ring buffer.

item->opcode = entry.opcode;
item->data = "";
} else {
item = ring_buffer_->GetTail(true);
Copy link
Collaborator

@romange romange Sep 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Please add comment saying, it provide a pointer to next tail, overriding the head if buffer is full.
  2. I would like to understand how you envision product behavior in this case. Are you saying, there is no backpressure whatsoever, and if this item has not been consumed by a replica, then cest la vie?
  3. How replica will know that its stable sync channel was overridden and its missing some entries?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I wrote below - currently no change, backpressure is applied through the callbacks so we can't override an entry that hasn't been sent.

item.txid = entry.txid;
VLOG(1) << "Writing item [" << item.lsn << "]: " << entry.ToString();
ring_buffer_->EmplaceOrOverride(move(item));
io::BufSink buf_sink{&ring_serialize_buf_};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using StringSink is better here.
you serialize into a string and then just move into item->data saving a copy at line 155.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping a big backing buffer will prevent grow allocations

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can also just serialize directly into item->data without moving, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm, StringSink owns its string

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So - if I always move out of a StringBuf I always have to allocate a new string. Where as currently I'm reusing the strings from the ring buffer. I think less allocations is better than less copies in this case

Copy link
Contributor Author

@royjacobson royjacobson Sep 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? item is a pointer to an object on the ring buffer and I'm calling std::string::operator=(std::string_view)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its currently allocating a single string per entry - but exactly once. A moved-from string sink can cause multiple allocations because the string might need to grow after multiple write calls

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand where any allocation happens here (except for the first 16k entries)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not remember we reuse items in the ring buffer. It's worth adding a comment because it can bite us later by hiding allocated memory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I thought about this again. I agree that memory usage can be an issue. I like @dranikpg's suggestion about two buffers (one for entries, one for bytes) but it's a bit more tricky to handle.
So I suggest for now to lower the default size (which is currently 64k, I didn't remember correctly) to 1k and later we can adjust this better and add visibility into the memory consumption.

@royjacobson
Copy link
Contributor Author

Why did we punt on the design review? I am curious to know if we still maintain back-pressure when replicating via ring buffer.

I left it as an open question.

We still have backpressure and I don't plan to change that right now, but it would be pretty easy to do so.

@royjacobson
Copy link
Contributor Author

A thought I had. Instead of using RingBufferstd::string we can use two rotating byte buffers (like the streamer does) to control offset not only in entries, but also in bytes (for example a max lag of 5mb is allowed). Also, we don't need separate allocations for all entries (which are likely fairly cheap either way). The downside that memory usage is twice higher and you drop half of the entries when filling up (or you keep track with an external vector which makes it difficult)

I agree. I think it's a pretty simple change but I'd rather do it in another PR if we decide we want to.

adiholden
adiholden previously approved these changes Sep 13, 2023
dranikpg
dranikpg previously approved these changes Sep 14, 2023
Copy link
Contributor

@dranikpg dranikpg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I don't have any further comments and agree that we can try out different ring buffer variations later

@royjacobson royjacobson merged commit db21b73 into main Sep 18, 2023
@royjacobson royjacobson deleted the ring_buffer_improvements branch September 18, 2023 10:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants